Why Open Source AI Will Win - You shouldn't bet against the bazaar or the GPU p̶o̶o̶r̶ hungry.

bnew

Veteran
Joined
Nov 1, 2015
Messages
43,945
Reputation
7,357
Daps
133,600

Why Open Source AI Will Win​

You shouldn't bet against the bazaar or the GPU p̶o̶o̶r̶ hungry.​


VARUN

SEP 15, 2023


Linux is subversive. Who would have thought even five years ago (1991) that a world-class operating system could coalesce as if by magic out of part-time hacking by several thousand developers scattered all over the planet, connected only by the tenuous strands of the Internet?

Certainly not I.

opening remarks in The Cathedral and the Bazaar by Eric Raymond.


There’s a popular floating theory on the Internet that a combination of the existing foundation model companies will be the end game for AI.

In the near future, every company will rent a “brain” from a model provider, such as OpenAI/Anthropic, and build applications that build on top of its cognitive capabilities.

In other words, AI is shaping up to be an oligopoly of sorts, with only a small set of serious large language model (LLM) providers.

I don’t think this could be farther from the truth. I truly believe that open source will have more of an impact on the future of LLMs and image models than the broad public believes.

There are a few arguments against open source that I see time and time again.
  1. Open source AI cannot compete with the resources at industry labs. Building foundation models is expensive, and non-AI companies looking to build AI features will outsource their intelligence layer to a company that specializes in it. Your average company cannot scale LLMs or produce novel results the same way a well capitalized team of talented researchers can. On the image generation side, Midjourney is miles ahead of anything else.
  2. Open source AI is not safe. Mad scientists cooking up intelligence on their cinderblock-encased GPUs will not align their models with general human interests
    1
    .

  3. Open source AI is incapable of reasoning. Not only do open source models perform more poorly than closed models on benchmarks, but they also lack emergent capabilities, those that would enable agentic workflows, for example.

While they seem reasonable, I think these arguments hold very little water.

LLMs are business critical​

Outsourcing a task is fine — when the task is not business critical.
Infrastructure products save users from wasting money and energy on learning Kubernetes or hiring a team of DevOps engineers. No company should have to hand-roll their own HR/bill payments software. There are categories of products that enable companies to “focus on what makes their beer taste better”
2
.


LLMs, for the most part, do not belong in this category. There are some incumbents building AI features on existing products, where querying OpenAI saves them on hiring ML engineers. For them, leveraging closed AI makes sense.

However, there’s a whole new category of AI native businesses for whom this risk is too great. Do you really want to outsource your core business, one that relies on confidential data, to OpenAI or Anthropic? Do you want to spend the next few years of your life working on a “GPT wrapper”?

Obviously not.

If you’re building an AI native product, your primary goal is getting off of OpenAI as soon as you possibly can. Ideally, you can bootstrap your intelligence layer using a closed source provider, build a data flywheel from engaged users, and then fine-tune your own models to perform your tasks with higher accuracy, less latency, and more control.

Every business needs to own their core product, and for AI native startups, their core product is a model trained on proprietary data
3
. Using closed source model providers for the long haul exposes an AI native company to undue risk.


There is too much pressure pent up for open source LLMs to flop. The lives of many companies are at stake. Even Google has acknowledged that they have no moat in this new world of open source AI.

Reasoning doesn’t actually matter​

The general capabilities of LLMs open them up to an exponential distribution of use cases. The most important tasks are fairly straightforward: summarization, explain like I’m 5, create a list (or some other structure) from a blob of text, etc.

Reasoning, the type you get from scaling these models to get larger, doesn’t matter for 85% of use cases. Researchers love sharing that their 200B param model can solve challenging math problems or build a website from a napkin sketch, but I don’t think most users (or developers) have a burning need for these capabilities.

The truth is that open source models are incredibly good at the most valuable tasks, and can be fine-tuned to cover likely up to 99% of use-cases when a product has collected enough labeled data.
Llama 2 performance

Fine-tuned Llama 2 models vs. GPT-4 (from Anyscale)

Reasoning, the holy grail that researchers are chasing, probably doesn’t matter nearly as much as people think.

More important than reasoning is context length and truthfulness.

Let’s start with context length. The longer the context length for a language model, the longer the prompts and chat logs you can pass in.

The original Llama has a context length of 2k tokens. Llama 2 has a context length of 4k.

Earlier this year, an indie AI hacker discovered that a single line code change to the RoPE embeddings for Llama 2 would give you up to 8K of context length for free with no additional training.

Just last week another indie research project was released, YaRN, that extends Llama 2’s context length to 128k tokens.

I still don’t have access to GPT-4 32k. This is the speed of open source.


While contexts have scaled up, the hardware requirements to run massive models have also scaled down. You can now run state-of-the-art massive language models from your Macbook thanks to projects like Llama.cpp. Being able to use these models locally is a huge plus for security and costs as well. In the limit, you can run your models on your users’ hardware. Models are continuing to scale down while retaining quality. Microsoft’s Phi-1.5 is only 1.3 billion parameters but meets Llama 2 7B on several benchmarks. Open source LLM experimentation will continue to explode as consumer hardware and the GPU poor rise to the challenge.

On truthfulness: out-of-the-box open source models are less truthful than closed source models, and I think this is actually fine. In many cases, hallucination can be a feature, not a bug, particularly when it comes to creative tasks like storytelling.

Closed AI models have a certain filter that make them sound artificial and less interesting. MythoMax-L2 tells significantly better stories than Claude 2 or ChatGPT, at only 13B parameters. When it comes to honestly, the latest open source LLMs work well with retrieval augmented generation, and they will only get better.

Control above all else​

Let’s take a brief look at the image generation side.

I would argue that Stable Diffusion XL (SDXL), the best open source model, is nearly on-par with Midjourney.


Stable Diffusion XL generations for the prompt “an astronaut playing a guitar on Mars with a llama”. These images were generated on the first try, no cherry-picking needed.

In exchange for the slightly worse ergonomics, Stable Diffusion users have access to hundreds of community crafted LoRAs
4
, fine-tunes, and textual embeddings. Users quickly discovered hands were a sore for SDXL, and within weeks a LoRA that fixes hands appeared online.


Other open source projects like ControlNet give Stable Diffusion users significantly more power when it comes to structuring their outputs, where Midjourney falls flat.


A flowchart of how Stable Diffusion + ControlNet works. Clipped from here.

Moreover, Midjourney doesn’t have an API, so if you want to build a product with an image diffusion feature, you would have to use Stable Diffusion in some form.
r/StableDiffusion - Spiral Town - different approach to qr monster

This image went viral on Twitter and Reddit this week. It uses Stable Diffusion with ControlNet. Currently, you can’t create images like this on Midjourney.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
43,945
Reputation
7,357
Daps
133,600
{continued}

There are similar controllable features and optimizations that open source LLMs enable.

An LLM’s logits, the token-wise probability mass function at each iteration, can be used to generate structured output. In other words, you can guarantee the generation of JSON without entering a potentially expensive “validate-retry” loop, which is what you would need to do if you were using OpenAI.
How to Get Better Outputs from Your Large Language Model | NVIDIA Technical  Blog

An example of logits from NVIDIA.

Open source models are smaller and run on your own dedicated instance, leading to lower end-to-end latencies. You can improve throughput by batching queries and using inference servers like vLLM.

There are many more tricks (see: speculative sampling, concurrent model execution, KV caching) that you can apply to improve on the axes of latency and throughput. The latency you see on the OpenAI endpoint is the best you can do with closed models, rendering it useless for many latency-sensitive products and too costly for large consumer products.

Thanks for reading Public Experiments! Subscribe to be notified when I post next.

On top of all this, you can also fine-tune or train your own LoRAs on top of open source models with maximal control. Frameworks like Axolotl and TRL have made this process simple
5
. While closed source model providers also have their own fine-tuning endpoints, you wouldn’t get the same level of control or visibility than if you did it yourself.



Falcon 180B, the largest open source model to date, was released last week. Within hours, Discords filled with mostly anonymous developers began exploring how they could recreate GPT-4 using this new model as a base layer.

Open source also provides guarantees on privacy and security.

You control the inflow and outflow of data in open models. The option to self-host is a necessity for many users, especially those working in regulated fields like healthcare. Many applications will also need to run on proprietary data, on both the training and inference side.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
43,945
Reputation
7,357
Daps
133,600
Security is best explained by Linus’s Law:
Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone.

Or, less formally, ‘‘Given enough eyeballs, all bugs are shallow.’’

Linux succeeded because it was built in the open. Users knew exactly what they were getting and had the opportunity to file bugs or even attempt to fix them on their own with community support.

The same is true for open source models. Even software 2.0 needs to be audited. Otherwise, things can change under the hood, leading to regressions in your application. This is unacceptable for most business use cases.

This paper recently showed that OpenAI’s endpoints drift over time. You cannot be confident that a prompt that works flawlessly will perform the same a month from now.

Adopting an open source approach for AI technology can create a wide-reaching network of checks and balances. Scientists and developers globally can peer-review, critique, study, and understand the underlying mechanisms, leading to improved safety, reliability, interpretability, and trust. Furthermore, widespread knowledge helps advance the technology responsibly while mitigating the risk of its misuse. Hugging Face is the new RedHat.

You can only trust models that you own and control. The same can’t be said for black box APIs. This is also why the AI safety argument against open source makes zero sense. History suggests, open source AI is, in fact, safer.

The Real Problem is Hype​

Why do people currently prefer closed source? Two reasons: ease-of-use and mindshare.

Open source is much harder to use than closed source models. It seems like you need to hire a team of machine learning engineers to build on top of open source as opposed to using the OpenAI API. This is ok, and will be true in the short-term. This is the cost of control and the rapid pace of innovation. People who are willing to spend time at the frontier will be treated by being able to build much better products. The ergonomics will get better.

The more unfortunate issue is mindshare.

Closed source model providers have captured the collective mindshare of this AI hype cycle. People don’t have time to mess around with open source nor do they have the awareness of what open source is capable of. But they do know about OpenAI, Pinecone, and LangChain.
Building LLMs-Powered Apps with OPL Stack | by Wen Yang | Towards Data  Science
The “OPL” stack, from Wen Yang.

Using the right tool is often conflated with using the best known tool. The current hype cycle has put closed source AI in the spotlight. As open source offerings mature and become more user-friendly and customizable, they will emerge as the superior choice for many applications.

Rather than getting swept up in the hype, forward-thinking organizations will use this period to deeply understand their needs and lay the groundwork to take full advantage of open source AI. They will build defensible and differentiated AI experiences on open technology. This measured approach enables a sustainable competitive advantage in the long run.

The future remains bright for pragmatic adopters who see past the hype and keep their eyes on the true prize: truly open AI.
1

Side note: alignment might hurt overall performance, according to this recently published paper.
2

From Jeff Bezos’ talk at YC ‘08.
3

In many ways, these models are just reflections of their underlying training data. In fact, model size doesn’t matter nearly as much. A 7B open-source model fine-tuned on SQL queries will outperform GPT-4.
4

Short for “low-rank approximation”, a technique used to train a small set of model weights (called an adapter) that can then be merged into the main model weights. It’s a more light-weight approach to fine-tuning.
Image

When training a LoRA, the pretrained weights for a model are untouched. Only the A and B matrices above are trained. The final trained adapter (usually only a few MB in size) can then be used on top of pretrained models or merged with the pretrained weights and offered as a new model altogether. Diagram from the original LoRA paper.

5

Similar to how DevOps relies on Infrastructure as Code (IaC) principles, Axolotl enables engineers to detail fine-tuning processes using YAML. As fine-tuning standardizes further, its accessibility improves even for those without deep ML experience.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
43,945
Reputation
7,357
Daps
133,600
{continued}

Here is the AI powered completion to the above -

### Section: 20.4 Legal Compliance in Innovation

Innovation, while being a catalyst for growth and development, must be conducted within the confines of the law. This is where legal compliance comes into play. Legal compliance is the process or procedure to ensure that an organization follows relevant laws, regulations, and business rules. In the context of innovation, legal compliance is crucial to ensure that the innovative process and its outcomes do not infringe upon any laws or regulations.
#### Subsection: 20.4c Implementing Legal Compliance in Innovation

Implementing legal compliance in innovation involves a series of steps that ensure the innovative process and its outcomes are in line with the relevant laws and regulations. This involves understanding the legal landscape, identifying potential legal hurdles, and taking proactive steps to address them.

The first step in implementing legal compliance in innovation is understanding the legal landscape. This involves gaining a comprehensive understanding of the laws and regulations that are relevant to the innovative process. This could include laws related to intellectual property rights, regulatory compliance, and enterprise law, among others.

The second step is identifying potential legal hurdles. This involves identifying areas where the innovative process could potentially infringe upon laws or regulations. For example, if the innovation involves the development of a new product, it is crucial to ensure that the product does not infringe upon any existing patents.

The third step is taking proactive steps to address potential legal hurdles. This could involve seeking legal advice, conducting a thorough legal review of the innovation, and implementing measures to ensure legal compliance. For example, if the innovation involves the use of a certain technology, it may be necessary to obtain a license for its use.

In addition to these steps, it is also crucial to ensure legal consistency. Legal consistency is a property that declares enterprise policies to be free of contradictions with the law. In the context of innovation, legal consistency ensures that the innovative process and its outcomes are in line with the organization's policies and do not contradict any laws or regulations.

Implementing legal compliance in innovation is not just about avoiding legal disputes or penalties. It is also about leveraging the law as a tool to protect and enhance the value of the innovation. By ensuring legal compliance, organizations can protect their innovations from potential legal disputes, enhance their value by securing intellectual property rights, and ensure their long-term sustainability.

In conclusion, legal compliance plays a crucial role in the process of innovation. By understanding the legal landscape, identifying potential legal hurdles, and taking proactive steps to address them, organizations can ensure that their innovations are legally compliant and sustainable.

This demonstration, along with the linked textbooks above, show that the reality of AI written textbooks is not far off. This result has ramifications that extend far beyond the interesting research result originally posed by Microsoft. It is worth stopping for a moment to reflect on these potential impacts.

AI and Education​

LLMs, like GPT-4 and its predecessors, have demonstrated an uncanny ability to generate human-like text, answer questions, and even assist in complex problem-solving. In fact, roughly half of this article’s content was AI generated (with human input and guidance). LLMs vast knowledge base, derived from extensive training on diverse datasets, allows them to provide insights on a wide range of topics. This capability is particularly beneficial in an educational context. Students, regardless of their learning pace or style, can interact with these models to get personalized explanations, delve deeper into topics of interest, or even get assistance with homework. In essence, LLMs can serve as on-demand tutors, democratizing access to quality education resources.

Recent research breakthroughs have showcased the potential of AI-generated content that rivals the quality of human-authored materials. These synthetic textbooks, curated and created by advanced AI models, can be tailored to fit specific curricula, student demographics, or even individual learner profiles. The dynamic nature of these textbooks allows for real-time updates, ensuring that the content remains current and relevant. Furthermore, they can be enriched with interactive elements, multimedia, and adaptive learning pathways, making learning more engaging and personalized. Even if textbooks begin to play a less pivotal role in direct human education, they are likely to remain as inputs to LLMs for quite some time.

However, the adoption of synthetic textbooks and LLM-assisted learning is not without challenges. Concerns about the accuracy, bias, and ethical implications of AI-generated content need to be addressed. Ensuring that these tools enhance rather than inhibit critical thinking and creativity is also paramount. The human touch, the empathy, passion, and intuition that educators bring to the classroom, remains irreplaceable. AI tools should be seen as complements, not replacements, to the traditional educational experience.

Where are we in the replication attempts?

It appears that we are able to make incredibly high quality textbooks. This is an effort which we are continuing to pursue to understand more deeply. Moreover, our dataset has grown to over 1 billion unique and differentiated tokens. We are doing ablation pre-training studies now to understand better how different datasets impact LLM learning. We are working on fine-tuning existing open source models to increase our data quality before scaling out to 20B tokens. Further, we are looking for human readers to critique and give feedback on the current state of the textbooks.

The targets are to do a complete replication of the phi-1.5 work, and likely to scale further. One such goal might be to see if it possible to create a more competitive 7 billion parameter model.

Conclusion

In the rapidly advancing world of artificial intelligence, the potential of AI-authored textbooks promises a revolution in education. As our research and replication attempts indicate, we are on the brink of harnessing AI to produce high-quality educational content that could democratize learning across the globe. While the results are promising, a balanced approach is necessary. Embracing AI's potential should not overshadow the intrinsic value of human touch in education. As we venture into this new frontier, collaboration between humans and AI will be the key, ensuring that the resultant content is accurate, unbiased, and tailored to the diverse needs of learners worldwide.


If this interests you, we ask you to join the open source collaboration around this work. Please reach out to me or join the Discord community to get started.

Acknowledgement​

I would like to thank runpod.io for their gracious support in the form of computational resources. Similarly, I owe thanks to Google Cloud Compute and OpenAI for their allocated credits.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
43,945
Reputation
7,357
Daps
133,600

C4DzsVr.png

ujr3ZSM.png

ZYfa34S.png




*note:
ChatGPT vision was released 20 days ago.
 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,047
Reputation
1,267
Daps
19,424
This one makes me anxious:

U.S. Lawmakers Want to Block China from 'American' RISC-V
Designed to be an open-source instruction set architecture (ISA) that can be used to build processors for every possible purpose, the RISC-V ISA has become quite a ubiquitous technology that can address both artificial intelligence and high-performance computing applications. This is perhaps why some U.S. lawmakers want this open-source technology to be essentially closed from Chinese entities, reports Reuters.

"The CCP (Chinese Communist Party) is abusing RISC-V to get around U.S. dominance of the intellectual property needed to design chips. U.S. persons should not be supporting a PRC tech transfer strategy that serves to degrade U.S. export control laws," said Michael McCaul, chairman of the House Foreign Affairs Committee, in a statement to Reuters.

Tom's Hardware has enquired RISC-V and the Linux Foundation for comment, yet has not received a response.

Legislators​


Senators Marco Rubio and Mark Warner urge the U.S. government to limit American tech companies' engagement with RISC-V technology with Chinese entities. They fear China's potential misuse of this collaborative tech for military and strategic advantage, which poses national security.

RISC-V is a competitor to Arm's proprietary ISA and is used in various tech devices, from smartphones to advanced AI processors. RISC-V was born out of the University of California, Berkeley's innovative labs, and its development has been under the guardianship of a Swiss nonprofit organization, ensuring its open-source nature remains intact.

Yet, the prevalent belief among certain lawmakers is that China is maneuvering this open, collaborative spirit to fortify its semiconductor industry. Their concerns not only touch on economic factors but also the potential military advancements that the People's Republic could harness by mastering this technology. Representative Mike Gallagher has even proposed that any U.S. collaboration with Chinese entities on this front should first secure an export license.

This technology's international embrace is evident, with companies from both the East and the West integrating it. China's tech titan, Huawei, views RISC-V as foundational to its chip endeavors. Meanwhile, in the U.S., tech powerhouses like Qualcomm and Google have shown enthusiasm for its potential, emphasizing its transformative capability for the industry.

Magnitude​

However, the proposed constraints have sparked concerns within the tech community. They could jeopardize cooperative work on open tech standards between the U.S. and China if enforced. This could pose challenges to China's ambition to be chip-independent and stunt the global market momentum for better, cheaper chips.

The magnitude of potential restrictions can be gauged by a statement from Jack Kang of SiFive, who equated such limitations to barring U.S. tech firms from the digital universe of the internet. The implications stretch beyond mere business metrics, pointing to a larger impact on global tech innovation and leadership.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
43,945
Reputation
7,357
Daps
133,600

Forget ChatGPT, why Llama and open source AI win 2023​

Sharon Goldman@sharongoldman

November 3, 2023 7:10 AM

Image created with DALL-E 3 for VentureBeat

Image created with DALL-E 3 for VentureBeat

Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.



Could a furry camelid take the 2023 crown for the biggest AI story of the year? If we’re talking about Llama, Meta’s large language model that took the AI research world by storm in February — followed by the commercial Llama 2 in July and Code Llama in August — I would argue that the answer is… (writer takes a moment to duck) yes.

I can almost see readers getting ready to pounce. “What? Come on — of course, ChatGPT was the biggest AI story of 2023!” I can hear the crowds yelling. “OpenAI’s ChatGPT, which launched on November 30, 2022, and reached 100 million users by February? ChatGPT, which brought generative AI into popular culture? It’s the bigger story by far!”

Hang on — hear me out. In the humble opinion of this AI reporter, ChatGPT was and is, naturally, a generative AI game-changer. It was, as Forrester analyst Rowan Curran told me, “the spark that set off the fire around generative AI.”

But starting in February of this year, when Meta released Llama, the first major free ‘open source’ Large Language Model (LLM) (Llama and Llama 2 are not fully open by traditional license definitions), open source AI began to have a moment — and a red-hot debate — that has not ebbed all year long. That is even as other Big Tech firms, LLM companies and policymakers have questioned the safety and security of AI models with open access to source code and model weights, and the high costs of compute have led to struggles across the ecosystem.

According to Meta, the open-source AI community has fine-tuned and released over 7,000 Llama derivatives on the Hugging Face platform since the model’s release, including a veritable animal farm of popular offspring including Koala, Vicuna, Alpaca, Dolly and RedPajama. There are many other open source models, including Mistral, Hugging Face, and Falcon, but Llama was the first that had the data and resources of a Big Tech company like Meta supporting it.

You could consider ChatGPT the equivalent of Barbie, 2023’s biggest blockbuster movie. But Llama and its open-source AI cohort are more like the Marvel Universe, with its endless spinoffs and offshoots that have the cumulative power to offer the biggest long-term impact on the AI landscape.

This will lead to “more real-world, impactful gen AI applications and cementing the open-source foundations of gen AI applications going forward,” Kjell Carlsson, head of data science strategy and evangelism at Domino Data Lab, told me.


Open-source AI will have the biggest long-term impact

The era of closed, proprietary models began, in a sense, with ChatGPT. OpenAI launched in 2015 as a more open-sourced, open-research company. But in 2023, OpenAI co-founder and chief scientist Ilya Sutskever told The Verge it was a mistake to share their research, citing competitive and safety concerns.

Meta’s chief AI scientist Yann LeCun, on the other hand, pushed for Llama 2 to be released with a commercial license along with the model weights. “I advocated for this internally,” he said at the AI Native conference in September. “I thought it was inevitable, because large language models are going to become a basic infrastructure that everybody is going to use, it has to be open.”

Carlsson, to be fair, considers my ChatGPT vs. Llama argument to be an apples-to-oranges comparison. Llama 2 is the game-changing model, he explained, because it is open-source, licensed for commercial use, can be fine-tuned, can be run on-premises, and is small enough to be operationalized at scale.

But ChatGPT, he said, is “the game-changing experience that brought the power of LLMs to the public consciousness and, most importantly, business leadership.” Yet as a model, he maintained, GPT 3.5 and 4 that power ChatGPT suffer “because they should not, except in exceptional circumstances, be used for anything beyond a PoC [proof of concept].”

Matt Shumer, CEO of Otherside AI, which developed Hyperwrite, pointed out that Llama likely would not have had the reception or influence it did if ChatGPT didn’t happen in the first place. But he agreed that Llama’s effects will be felt for years: “There are likely hundreds of companies that have gotten started over the last year or so that would not have been possible without Llama and everything that came after,” he said.

And Sridhar Ramaswamy, the former Neeva CEO who became SVP of data cloud company Snowflake after the company acquired his company, said “Llama 2 is 100% a game-changer — it is the first truly capable open source AI model.” ChatGPT had appeared to signal an LLM repeat of what happened with cloud, he said: “There would be three companies with capable models, and if you want to do anything you would have to pay them.”

Instead, Meta released Llama.


Early Llama leak led to a flurry of open-source LLMs

Launched in February, the first Llama model stood out because it came in several sizes, from 7 billion parameters to 65 billion parameters — Llama’s developers reported that the 13B parameter model’s performance on most NLP benchmarks exceeded that of the much larger GPT-3 (with 175B parameters) and that the largest model was competitive with state of the art models such as PaLM and Chinchilla. Meta made Llama’s model weights available for academics and researchers on a case-by-case basis — including Stanford for its Alpaca project.

But the Llama weights were subsequently leaked on *****. This allowed developers around the world to fully access a GPT-level LLM for the first time — leading to a flurry of new derivatives. Then in July, Meta released Llama 2 free to companies for commercial use, and Microsoft made Llama 2 available on its Azure cloud-computing service.

Those efforts came at a key moment when Congress began to talk about regulating artificial intelligence — in June, two U.S. Senators sent a letter to Meta CEO Mark Zuckerberg that questioned the Llama leak, saying they were concerned about the “potential for its misuse in spam, fraud, malware, privacy violations, harassment, and other wrongdoing and harms.”

But Meta consistently doubled down on its commitment to open-source AI: In an internal all-hands meeting in June, for example, Zuckerberg said Meta was building generative AI into all of its products and reaffirmed the company’s commitment to an “open science-based approach” to AI research.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
43,945
Reputation
7,357
Daps
133,600
{continued}


Meta has long been a champion of open research

More than any other Big Tech company, Meta has long been a champion of open research — including, notably, creating an open-source ecosystem around the PyTorch framework. As 2023 draws to a close, Meta will celebrate the 10th anniversary of FAIR (Fundamental AI Research), which was created “to advance the state of the art of AI through open research for the benefit of all.” Ten years ago, on December 9, 2013, Facebook announced that NYU Professor Yann LeCun would lead FAIR.

In an in-person interview with VentureBeat at Meta’s New York office, Joelle Pineau, VP of AI research at Meta, recalled that she joined Meta in 2017 because of FAIR’s commitment to open research and transparency.

“The reason I came there without interviewing anywhere else is because of the commitment to open science,” she said. “It’s the reason why many of our researchers are here. It’s part of the DNA of the organization.”

But the reason to do open research has changed, she added. “I would say in 2017, the main motivation was about the quality of the research and setting the bar higher,” she said. “What is completely new in the last year is how much this is a motor for the productivity of the whole ecosystem, the number of startups who come up and are just so glad that they have an alternative model.”

But, she added, every Meta release is a one-off. “We’re not committing to releasing everything [open] all the time, under any condition,” she said. “Every release is analyzed in terms of the advantages and the risks.”


Reflecting on Llama: ‘a bunch of small things done really well’

Angela Fan, a Meta FAIR research scientist who worked on the original Llama, said she also worked on Llama 2 and the efforts to convert these models into the user-facing product capabilities that Meta showed off at its Connect developer conference last month (some of which have caused controversy, like its newly-launched stickers and characters).

“I think the biggest reflection I have is even though the technology is still kind of nascent and almost squishy across the industry, it’s at a point where we can build some really interesting stuff and we’re able to do this kind of integration across all our apps in a really consistent way,” she told VentureBeat in an interview at Connect.

She added that the company looks for feedback from its developer community, as well as the ecosystem of startups using Llama for a variety of different applications. “We want to know, what do people think about Llama 2? What should we put into Llama 3?” she said.

But Llama’s secret sauce all along, she said, has been “a bunch of small things done really well and right over a longer period of time.” There were so many different components, she recalled — like getting the original data set right, figuring out the number of parameters and pre-training it on the right learning rate schedule.

“There were many small experiments that we learned from,” she said, adding that for someone who doesn’t understand AI research, it can seem “like a mad scientist sitting somewhere. But it’s truly just a lot of hard work.”


The push to protect open-source AI

A big open-source ecosystem with a broadly useful technology has been “our thesis all along,” said Vipul Ved Prakash, co-founder of Together, a startup known for creating the RedPajama dataset in April, which replicated the Llama dataset, and releasing a full-stack platform and cloud service for developers at startups and enterprises to build open-source AI — including by building on Llama 2.

Prakash, not surprisingly, agreed that he considers Llama and open-source AI to be the game-changers of 2023 — it is a story, he explained, of developing viable, high-quality models, with a network of companies and organizations building on them.

“The cost is distributed across this network and then when you’re providing fine tuning or inference, you don’t have to amortize the cost of the model builds,” he said.

But at the moment, open-source AI proponents feel the need to push to protect access to these LLMs as regulators circle. At the U.K. Safety Summit this week, the overarching theme of the event was to mitigate the risk of advanced AI systems wiping out humanity if it falls into the hands of bad actors — presumably with access to open-source AI.

But a vocal group from the open source AI community, led by LeCun and Google Brain co-founder Andrew Ng, signed a statement published by Mozilla saying that open AI is “an antidote, not a poison.”

Sriram Krishnan, a general partner at Andreessen Horowitz, tweeted in support of Llama and open-source AI:

“Realizing how important it was for @ylecun and team to get llama2 out of the door. A) they may have never had a chance to later legally B) we would have never seen what is possible with open source ( see all the work downstream of llama2) and thought of LLMs as the birthright of 2-4 companies.”


The Llama vs. ChatGPT debate continues

The debate over Llama vs. ChatGPT — as well as the debate over open source vs. closed source generally — will surely continue. When I reached out to a variety of experts to get their thoughts, it was ChatGPT for the win.

“Hands down, ChatGPT,” wrote Nikolaos Vasiloglou, VP of ML research at RelationalAI. “The reason it is a game-changer is not just its AI capabilities, but also the engineering that is behind it and its unbeatable operational costs to run it.”

And John Lyotier, CEO of TravelAI, wrote: “Without a doubt the clear winner would be ChatGPT. It has become AI in the minds of the public. People who would never have considered themselves technologists are suddenly using it and they are introducing their friends and families to AI via ChatGPT. It has become the ‘every-day person’s AI.’”

Then there was Ben James, CEO of Atlas, a 3D generative AI platform, who pointed out that Llama has reignited research in a way ChatGPT did not, and this will bring about stronger, longer-term impact.

“ChatGPT was the clear game-changer of 2023, but Llama will be the game-changer of the future,” he said.

Ultimately, perhaps what I’m trying to say — that Llama and open source AI win 2023 because of how it will impact 2024 and beyond — is similar to the way Forrester’s Curran puts it: “The zeitgeist generative AI created in 2023 would not have happened without something like ChatGPT, and the sheer number of humans who have now had the chance to interact with and experience these advanced tools, compared to other cutting edge technologies in history, is staggering,” he said.

But, he added, open source models – and particularly those like Llama 2 which have seen significant uptake from enterprise developers — are providing a lot of the ongoing fuel for the on-the-ground development and advancement of the space.

In the long term, Curran said, there will be a place for both proprietary and open source models, but without the open source community, the generative AI space would be a much less advanced, very niche market, rather than a technology that has the potential for massive impacts across many aspects of work and life.

“The open source community has been and will be where many of the significant long-term impacts come from, and the open source community is essential for GenAI’s success,” he said.
 

Hood Critic

The Power Circle
Joined
May 2, 2012
Messages
22,751
Reputation
3,520
Daps
103,532
Reppin
דעת
That's a bold claim, Llama will undoubtedly have the bigger long term impact in the personal space. But open source ai won't be what puts people out of work, that will be corporate backed ai.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
43,945
Reputation
7,357
Daps
133,600

META

Mark Zuckerberg’s new goal is creating artificial general intelligence​

And he wants Meta to open source it. Eventually. Maybe.​


By Alex Heath
Alex Heath Profile and Activity - The Verge, a deputy editor and author of the Command Line newsletter. He’s covered the tech industry for over a decade at The Information and other outlets.

Jan 18, 2024, 12:59 PM EST103 Comments / 103 New



246967_Meta_Zuckerberg_Interview_final4_CVirginia.jpg

Cath Virginia / The Verge | Photos by Getty Images

Fueling the generative AI craze is a belief that the tech industry is on a path to achieving superhuman, god-like intelligence.

OpenAI’s stated mission is to create this artificial general intelligence, or AGI. Demis Hassabis, the leader of Google’s AI efforts, has the same goal.

Now, Meta CEO Mark Zuckerberg is entering the race. While he doesn’t have a timeline for when AGI will be reached, or even an exact definition for it, he wants to build it. At the same time, he’s shaking things up by moving Meta’s AI research group, FAIR, to the same part of the company as the team building generative AI products across Meta’s apps. The goal is for Meta’s AI breakthroughs to more directly reach its billions of users.

“We’ve come to this view that, in order to build the products that we want to build, we need to build for general intelligence,” Zuckerberg tells me in an exclusive interview. “I think that’s important to convey because a lot of the best researchers want to work on the more ambitious problems.”

Here, Zuckerberg is saying the quiet part aloud. The battle for AI talent has never been more fierce, with every company in the space vying for an extremely small pool of researchers and engineers. Those with the needed expertise can command eye-popping compensation packages to the tune of over $1 million a year. CEOs like Zuckerberg are routinely pulled in to try to win over a key recruit or keep a researcher from defecting to a competitor.

“We’re used to there being pretty intense talent wars,” he says. “But there are different dynamics here with multiple companies going for the same profile, [and] a lot of VCs and folks throwing money at different projects, making it easy for people to start different things externally.”

After talent, the scarcest resource in the AI field is the computing power needed to train and run large models. On this topic, Zuckerberg is ready to flex. He tells me that, by the end of this year, Meta will own more than 340,000 of Nvidia’s H100 GPUs — the industry’s chip of choice for building generative AI.

“We have built up the capacity to do this at a scale that may be larger than any other individual company”

External research has pegged Meta’s H100 shipments for 2023 at 150,000, a number that is tied only with Microsoft’s shipments and at least three times larger than everyone else’s. When its Nvidia A100s and other AI chips are accounted for, Meta will have a stockpile of almost 600,000 GPUs by the end of 2024, according to Zuckerberg.

“We have built up the capacity to do this at a scale that may be larger than any other individual company,” he says. “I think a lot of people may not appreciate that.”

The realization

No one working on AI, including Zuckerberg, seems to have a clear definition for AGI or an idea of when it will arrive.

“I don’t have a one-sentence, pithy definition,” he tells me. “You can quibble about if general intelligence is akin to human level intelligence, or is it like human-plus, or is it some far-future super intelligence. But to me, the important part is actually the breadth of it, which is that intelligence has all these different capabilities where you have to be able to reason and have intuition.”

Related

Inside Meta’s big AI reorg

He sees its eventual arrival as being a gradual process, rather than a single moment. “I’m not actually that sure that some specific threshold will feel that profound.”

As Zuckerberg explains it, Meta’s new, broader focus on AGI was influenced by the release of Llama 2, its latest large language model, last year. The company didn’t think that the ability for it to generate code made sense for how people would use a LLM in Meta’s apps. But it’s still an important skill to develop for building smarter AI, so Meta built it anyway.

“One hypothesis was that coding isn’t that important because it’s not like a lot of people are going to ask coding questions in WhatsApp,” he says. “It turns out that coding is actually really important structurally for having the LLMs be able to understand the rigor and hierarchical structure of knowledge, and just generally have more of an intuitive sense of logic.”

“Our ambition is to build things that are at the state of the art and eventually the leading models in the industry”

Meta is training Llama 3 now, and it will have code-generating capabilities, he says. Like Google’s new Gemini model, another focus is on more advanced reasoning and planning abilities.

“Llama 2 wasn’t an industry-leading model, but it was the best open-source model,” he says. “With Llama 3 and beyond, our ambition is to build things that are at the state of the art and eventually the leading models in the industry.”

Open versus closed

The question of who gets to eventually control AGI is a hotly debated one, as the near implosion of OpenAI recently showed the world.

Zuckerberg wields total power at Meta thanks to his voting control over the company’s stock. That puts him in a uniquely powerful position that could be dangerously amplified if AGI is ever achieved. His answer is the playbook that Meta has followed so far for Llama, which can — at least for most use cases — be considered open source.

“I tend to think that one of the bigger challenges here will be that if you build something that’s really valuable, then it ends up getting very concentrated,” Zuckerberg says. “Whereas, if you make it more open, then that addresses a large class of issues that might come about from unequal access to opportunity and value. So that’s a big part of the whole open-source vision.”

Without naming names, he contrasts Meta’s approach to that of OpenAI’s, which began with the intention of open sourcing its models but has becoming increasingly less transparent. “There were all these companies that used to be open, used to publish all their work, and used to talk about how they were going to open source all their work. I think you see the dynamic of people just realizing, ‘Hey, this is going to be a really valuable thing, let’s not share it.’”

While Sam Altman and others espouse the safety benefits of a more closed approach to AI development, Zuckerberg sees a shrewd business play. Meanwhile, the models that have been deployed so far have yet to cause catastrophic damage, he argues.

“The biggest companies that started off with the biggest leads are also, in a lot of cases, the ones calling the most for saying you need to put in place all these guardrails on how everyone else builds AI,” he tells me. “I’m sure some of them are legitimately concerned about safety, but it’s a hell of a thing how much it lines up with the strategy.”

“I’m sure some of them are legitimately concerned about safety, but it’s a hell of a thing how much it lines up with the strategy”

Zuckerberg has his own motivations, of course. The end result of his open vision for AI is still a concentration of power, just in a different shape. Meta already has more users than almost any company on Earth and a wildly profitable social media business. AI features can arguably make his platforms even stickier and more useful. And if Meta can effectively standardize the development of AI by releasing its models openly, its influence over the ecosystem will only grow.

There’s another wrinkle: If AGI is ever achieved at Meta, the call to open source it or not is ultimately Zuckerberg’s. He’s not ready to commit either way.

“For as long as it makes sense and is the safe and responsible thing to do, then I think we will generally want to lean towards open source,” he says. “Obviously, you don’t want to be locked into doing something because you said you would.”

Don’t call it a pivot

In the broader context of Meta, the timing of Zuckerberg’s new AGI push is a bit awkward.

It has been only two years since he changed the company name to focus on the metaverse. Meta’s latest smart glasses with Ray-Ban are showing early traction, but full-fledged AR glasses feel increasingly further out. Apple, meanwhile, has recently validated his bet on headsets with the launch of the Vision Pro, even though VR is still a niche industry.

Zuckerberg, of course, disagrees with the characterization of his focus on AI being a pivot.

“I don’t know how to more unequivocally state that we’re continuing to focus on Reality Labs and the metaverse,” he tells me, pointing to the fact that Meta is still spending north of $15 billion a year on the initiative. Its Ray-Ban smart glasses recently added a visual AI assistant that can identify objects and translate languages. He sees generative AI playing a more critical role in Meta’s hardware efforts going forward.

“I don’t know how to more unequivocally state that we’re continuing to focus on Reality Labs and the metaverse”

He sees a future in which virtual worlds are generated by AI and filled with AI characters that accompany real people. He says a new platform is coming this year to let anyone create their own AI characters and distribute them across Meta’s social apps. Perhaps, he suggests, these AIs will even be able to post their own content to the feeds of Facebook, Instagram, and Threads.

Meta is still a metaverse company. It’s the biggest social media company in the world. It’s now trying to build AGI. Zuckerberg frames all this around the overarching mission of “building the future of connection.”

To date, that connection has been mostly humans interacting with each other. Talking to Zuckerberg, it’s clear that, going forward, it’s increasingly going to be about humans talking to AIs, too. It’s obvious that he views this future as inevitable and exciting, whether the rest of us are ready for it or not.
 
Top