Large language models can do jaw-dropping things. But nobody knows exactly why.

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,335
Reputation
8,092
Daps
154,019

ARTIFICIAL INTELLIGENCE

Large language models can do jaw-dropping things. But nobody knows exactly why.​

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

By Will Douglas Heavenarchive page

March 4, 2024

A photo illustration showing speech bubbles full of data.

SARAH ROGERS/MITTR | GETTY

Two years ago, Yuri Burda and Harri Edwards, researchers at the San Francisco–based firm OpenAI, were trying to find out what it would take to get a language model to do basic arithmetic. They wanted to know how many examples of adding up two numbers the model needed to see before it was able to add up any two numbers they gave it. At first, things didn’t go too well. The models memorized the sums they saw but failed to solve new ones.

By accident, Burda and Edwards left some of their experiments running far longer than they meant to—days rather than hours. The models were shown the example sums over and over again, way past the point when the researchers would otherwise have called it quits. But when the pair at last came back, they were surprised to find that the experiments had worked. They’d trained a language model to add two numbers—it had just taken a lot more time than anybody thought it should.

Curious about what was going on, Burda and Edwards teamed up with colleagues to study the phenomenon. They found that in certain cases, models could seemingly fail to learn a task and then all of a sudden just get it, as if a lightbulb had switched on. This wasn’t how deep learning was supposed to work. They called the behavior grokking.

“It’s really interesting,” says Hattie Zhou, an AI researcher at the University of Montreal and Apple Machine Learning Research, who wasn’t involved in the work. “Can we ever be confident that models have stopped learning? Because maybe we just haven’t trained for long enough.”

The weird behavior has captured the imagination of the wider research community. “Lots of people have opinions,” says Lauro Langosco at the University of Cambridge, UK. “But I don’t think there’s a consensus about what exactly is going on.”

With hopes and fears about the technology running wild, it's time to agree on what it can and can't do.

Grokking is just one of several odd phenomena that have AI researchers scratching their heads. The largest models, and large language models in particular, seem to behave in ways textbook math says they shouldn’t. This highlights a remarkable fact about deep learning, the fundamental technology behind today’s AI boom: for all its runaway success, nobody knows exactly how—or why—it works.

“Obviously, we’re not completely ignorant,” says Mikhail Belkin, a computer scientist at the University of California, San Diego. “But our theoretical analysis is so far off what these models can do. Like, why can they learn language? I think this is very mysterious.”

The biggest models are now so complex that researchers are studying them as if they were strange natural phenomena, carrying out experiments and trying to explain the results. Many of those observations fly in the face of classical statistics, which had provided our best set of explanations for how predictive models behave.

So what, you might say. In the last few weeks, Google DeepMind has rolled out its generative models across most of its consumer apps. OpenAI wowed people with Sora, its stunning new text-to-video model. And businesses around the world are scrambling to co-opt AI for their needs. The tech works—isn’t that enough?

But figuring out why deep learning works so well isn’t just an intriguing scientific puzzle. It could also be key to unlocking the next generation of the technology—as well as getting a handle on its formidable risks.

“These are exciting times,” says Boaz Barak, a computer scientist at Harvard University who is on secondment to OpenAI’s superalignment team for a year. “Many people in the field often compare it to physics at the beginning of the 20th century. We have a lot of experimental results that we don’t completely understand, and often when you do an experiment it surprises you.”

Old code, new tricks

Most of the surprises concern the way models can learn to do things that they have not been shown how to do. Known as generalization, this is one of the most fundamental ideas in machine learning—and its greatest puzzle. Models learn to do a task—spot faces, translate sentences, avoid pedestrians—by training with a specific set of examples. Yet they can generalize, learning to do that task with examples they have not seen before. Somehow, models do not just memorize patterns they have seen but come up with rules that let them apply those patterns to new cases. And sometimes, as with grokking, generalization happens when we don’t expect it to.

Large language models in particular, such as OpenAI’s GPT-4 and Google DeepMind’s Gemini, have an astonishing ability to generalize. “The magic is not that the model can learn math problems in English and then generalize to new math problems in English,” says Barak, “but that the model can learn math problems in English, then see some French literature, and from that generalize to solving math problems in French. That’s something beyond what statistics can tell you about.”

When Zhou started studying AI a few years ago, she was struck by the way her teachers focused on the how but not the why. “It was like, here is how you train these models and then here’s the result,” she says. “But it wasn’t clear why this process leads to models that are capable of doing these amazing things.” She wanted to know more, but she was told there weren’t good answers: “My assumption was that scientists know what they’re doing. Like, they’d get the theories and then they’d build the models. That wasn’t the case at all.”

The rapid advances in deep learning over the last 10-plus years came more from trial and error than from understanding. Researchers copied what worked for others and tacked on innovations of their own. There are now many different ingredients that can be added to models and a growing cookbook filled with recipes for using them. “People try this thing, that thing, all these tricks,” says Belkin. “Some are important. Some are probably not.”

“It works, which is amazing. Our minds are blown by how powerful these things are,” he says. And yet for all their success, the recipes are more alchemy than chemistry: “We figured out certain incantations at midnight after mixing up some ingredients,” he says.

Overfitting

The problem is that AI in the era of large language models appears to defy textbook statistics. The most powerful models today are vast, with up to a trillion parameters (the values in a model that get adjusted during training). But statistics says that as models get bigger, they should first improve in performance but then get worse. This is because of something called overfitting.

When a model gets trained on a data set, it tries to fit that data to a pattern. Picture a bunch of data points plotted on a chart. A pattern that fits the data can be represented on that chart as a line running through the points. The process of training a model can be thought of as getting it to find a line that fits the training data (the dots already on the chart) but also fits new data (new dots).

A straight line is one pattern, but it probably won’t be too accurate, missing some of the dots. A wiggly line that connects every dot will get full marks on the training data, but won’t generalize. When that happens, a model is said to overfit its data.

An exclusive conversation with Ilya Sutskever on his fears for the future of AI and why they’ve made him change the focus of his life’s work.

According to classical statistics, the bigger a model gets, the more prone it is to overfitting. That’s because with more parameters to play with, it’s easier for a model to hit on wiggly lines that connect every dot. This suggests there’s a sweet spot between under- and overfitting that a model must find if it is to generalize. And yet that’s not what we see with big models. The best-known example of this is a phenomenon known as double descent.

The performance of a model is often represented in terms of the number of errors it makes: as performance goes up, error rate goes down (or descends). For decades, it was believed that error rate went down and then up as models got bigger: picture a U-shaped curve with the sweet spot for generalization at the lowest point. But in 2018, Belkin and his colleagues found that when certain models got bigger, their error rate went down, then up—and then down again (a double descent, or W-shaped curve). In other words, large models would somehow overrun that sweet spot and push through the overfitting problem, getting even better as they got bigger.

A year later, Barak coauthored a paper showing that the double-descent phenomenon was more common than many thought. It happens not just when models get bigger but also in models with large amounts of training data or models that are trained for longer. This behavior, dubbed benign overfitting, is still not fully understood. It raises basic questions about how models should be trained to get the most out of them.

Researchers have sketched out versions of what they think is going on. Belkin believes there’s a kind of Occam’s razor effect in play: the simplest pattern that fits the data—the smoothest curve between the dots—is often the one that generalizes best. The reason bigger models keep improving longer than it seems they should could be that bigger models are more likely to hit upon that just-so curve than smaller ones: more parameters means more possible curves to try out after ditching the wiggliest.

“Our theory seemed to explain the basics of why it worked,” says Belkin. “And then people made models that could speak 100 languages and it was like, okay, we understand nothing at all.” He laughs: “It turned out we weren’t even scratching the surface.”

For Belkin, large language models are a whole new mystery. These models are based on transformers, a type of neural network that is good at processing sequences of data, like words in sentences.

There’s a lot of complexity inside transformers, says Belkin. But he thinks at heart they do more or less the same thing as a much better understood statistical construct called a Markov chain, which predicts the next item in a sequence based on what’s come before. But that isn’t enough to explain everything that large language models can do. “This is something that, until recently, we thought should not work,” says Belkin. “That means that something was fundamentally missing. It identifies a gap in our understanding of the world.”

Belkin goes further. He thinks there could be a hidden mathematical pattern in language that large language models somehow come to exploit: “Pure speculation but why not?”

“The fact that these things model language is probably one of the biggest discoveries in history,” he says. “That you can learn language by just predicting the next word with a Markov chain—that’s just shocking to me.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,335
Reputation
8,092
Daps
154,019

Start small

Researchers are trying to figure it out piece by piece. Because large models are too complex to study themselves, Belkin, Barak, Zhou, and others experiment instead on smaller (and older) varieties of statistical model that are better understood. Training these proxies under different conditions and on various kinds of data and observing what happens can give insight into what’s going on. This helps get new theories off the ground, but it is not always clear if those theories will hold for larger models too. After all, it is in the complexity of large models that many of the weird behaviors reside.

Is a theory of deep learning coming? Daniel Hsu, a computer scientist at Columbia University who was one of Belkin’s coauthors on the double-descent paper, doesn’t expect all the answers anytime soon. “We have better intuition now,” he says. “But really explaining everything about why neural networks have this kind of unexpected behavior? We’re still far from doing that.”

Exclusive conversations that take us behind the scenes of a cultural phenomenon.

In 2016, Chiyuan Zhang at MIT and colleagues at Google Brain published an influential paper titled “Understanding Deep Learning Requires Rethinking Generalization.” In 2021, five years later, they republished the paper, calling it “Understanding Deep Learning (Still) Requires Rethinking Generalization.” What about in 2024? “Kind of yes and no,” says Zhang. “There has been a lot of progress lately, though probably many more questions arise than get resolved.”

Meanwhile, researchers continue to wrestle even with the basic observations. In December, Langosco and his colleagues presented a paper at NeurIPS, a top AI conference, in which they claimed that grokking and double descent are in fact aspects of the same phenomenon. “You eyeball them and they look kind of similar,” says Langosco. He believes that an explanation of what’s going on should account for both.

At the same conference, Alicia Curth, who studies statistics at the University of Cambridge, and her colleagues argued that double descent is in fact an illusion. “It didn’t sit very well with me that modern machine learning is some kind of magic that defies all the laws that we’ve established so far,” says Curth. Her team argued that the double-descent phenomenon—where models appear to perform better, then worse, and then better again as they get bigger—arises because of the way the complexity of the models was measured.

Belkin and his colleagues used model size—the number of parameters—as a measure of complexity. But Curth and her colleagues found that the number of parameters might not be a good stand-in for complexity because adding parameters sometimes makes a model more complex and sometimes makes it less so. It depends what the values are, how they get used during training, and how they interact with others—much of which stays hidden inside the model. “Our takeaway was that not all model parameters are created equal,” says Curth.

In short, if you use a different measure for complexity, large models might conform to classical statistics just fine. That’s not to say there isn’t a lot we don’t understand about what happens when models get bigger, says Curth. But we already have all the math we need to explain it.

A great mystery of our time

It's true that such debates can get into the weeds. Why does it matter whether AI models are underpinned by classical statistics or not?

One answer is that better theoretical understanding would help build even better AI or make it more efficient. At the moment, progress has been fast but unpredictable. Many things that OpenAI’s GPT-4 can do came as a surprise even to the people who made it. Researchers are still arguing over what it can and cannotachieve. “Without some sort of fundamental theory, it’s very hard to have any idea what we can expect from these things,” says Belkin.

Barak agrees. “Even once we have the models, it is not straightforward even in hindsight to say exactly why certain capabilities emerged when they did,” he says.

This isn’t only about managing progress—it’s about anticipating risk, too. Many of the researchers working on the theory behind deep learning are motivated by safety concerns for future models. “We don’t know what capabilities GPT-5 will have until we train it and test it,” says Langosco. “It might be a medium-size problem right now, but it will become a really big problem in the future as models become more powerful.”

Barak works on OpenAI’s superalignment team, which was set up by the firm’s chief scientist, Ilya Sutskever, to figure out how to stop a hypothetical superintelligence from going rogue. “I’m very interested in getting guarantees,” he says. “If you can do amazing things but you can’t really control it, then it’s not so amazing. What good is a car that can drive 300 miles per hour if it has a shaky steering wheel?”

But beneath all that there’s also a grand scientific challenge. “Intelligence is definitely up there as one of the great mysteries of our time,” says Barak.

“We’re a very infant science,” he says. “The questions that I’m most excited about this month might be different to the questions that I’m most excited about next month. We are still discovering things. We very much need to experiment and get surprised.”
 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,462
Reputation
1,377
Daps
21,111
Great article.

  1. Heard of grokking but never knew what it was prior.
  2. It is concerning that there are parts of the article says "If we figure out X, we'll be able to make future iterations even more powerful'. It feels like their eyes are on the future to a degree that they can overlook something pivotal in the present
  3. Statistics were never my strong suit, but this stuck to entry-level ideas for the most part and explained things well.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,335
Reputation
8,092
Daps
154,019

Mapping the Mind of a Large Language Model​

May 21, 2024

Read the paper

image

Today we report a significant advance in understanding the inner workings of AI models. We have identified how millions of concepts are represented inside Claude Sonnet, one of our deployed large language models. This is the first ever detailed look inside a modern, production-grade large language model. This interpretability discovery could, in future, help us make AI models safer.

We mostly treat AI models as a black box: something goes in and a response comes out, and it's not clear why the model gave that particular response instead of another. This makes it hard to trust that these models are safe: if we don't know how they work, how do we know they won't give harmful, biased, untruthful, or otherwise dangerous responses? How can we trust that they’ll be safe and reliable?

Opening the black box doesn't necessarily help: the internal state of the model—what the model is "thinking" before writing its response—consists of a long list of numbers ("neuron activations") without a clear meaning. From interacting with a model like Claude, it's clear that it’s able to understand and wield a wide range of concepts—but we can't discern them from looking directly at neurons. It turns out that each concept is represented across many neurons, and each neuron is involved in representing many concepts.

Previously, we made some progress matching patterns of neuron activations, called features, to human-interpretable concepts. We used a technique called "dictionary learning", borrowed from classical machine learning, which isolates patterns of neuron activations that recur across many different contexts. In turn, any internal state of the model can be represented in terms of a few active features instead of many active neurons. Just as every English word in a dictionary is made by combining letters, and every sentence is made by combining words, every feature in an AI model is made by combining neurons, and every internal state is made by combining features.

In October 2023, we reported success applying dictionary learning to a very small "toy" language model and found coherent features corresponding to concepts like uppercase text, DNA sequences, surnames in citations, nouns in mathematics, or function arguments in Python code.

Those concepts were intriguing—but the model really was very simple. Other researchers subsequently applied similar techniques to somewhat larger and more complex models than in our original study. But we were optimistic that we could scale up the technique to the vastly larger AI language models now in regular use, and in doing so, learn a great deal about the features supporting their sophisticated behaviors. This required going up by many orders of magnitude—from a backyard bottle rocket to a Saturn-V.

There was both an engineering challenge (the raw sizes of the models involved required heavy-duty parallel computation) and scientific risk (large models behave differently to small ones, so the same technique we used before might not have worked). Luckily, the engineering and scientific expertise we've developed training large language models for Claude actually transferred to helping us do these large dictionary learning experiments. We used the same scaling law philosophy that predicts the performance of larger models from smaller ones to tune our methods at an affordable scale before launching on Sonnet.

As for the scientific risk, the proof is in the pudding.

We successfully extracted millions of features from the middle layer of Claude 3.0 Sonnet, (a member of our current, state-of-the-art model family, currently available on claude.ai), providing a rough conceptual map of its internal states halfway through its computation. This is the first ever detailed look inside a modern, production-grade large language model.

Whereas the features we found in the toy language model were rather superficial, the features we found in Sonnet have a depth, breadth, and abstraction reflecting Sonnet's advanced capabilities.

We see features corresponding to a vast range of entities like cities (San Francisco), people (Rosalind Franklin), atomic elements (Lithium), scientific fields (immunology), and programming syntax (function calls). These features are multimodal and multilingual, responding to images of a given entity as well as its name or description in many languages.

Golden Gate Bridge Feature
A feature sensitive to mentions of the Golden Gate Bridge fires on a range of model inputs, from English mentions of the name of the bridge to discussions in Japanese, Chinese, Greek, Vietnamese, Russian, and an image. The orange color denotes the words or word-parts on which the feature is active.

We also find more abstract features—responding to things like bugs in computer code, discussions of gender bias in professions, and conversations about keeping secrets.

Abstract Feature Examples
Three examples of features that activate on more abstract concepts: bugs in computer code, descriptions of gender bias in professions, and conversations about keeping secrets.

We were able to measure a kind of "distance" between features based on which neurons appeared in their activation patterns. This allowed us to look for features that are "close" to each other. Looking near a "Golden Gate Bridge" feature, we found features for Alcatraz Island, Ghirardelli Square, the Golden State Warriors, California Governor Gavin Newsom, the 1906 earthquake, and the San Francisco-set Alfred Hitchcock film Vertigo.

This holds at a higher level of conceptual abstraction: looking near a feature related to the concept of "inner conflict", we find features related to relationship breakups, conflicting allegiances, logical inconsistencies, as well as the phrase "catch-22". This shows that the internal organization of concepts in the AI model corresponds, at least somewhat, to our human notions of similarity. This might be the origin of Claude's excellent ability to make analogies and metaphors.

Nearest Neighbors to the 
Inner Conflict Feature
A map of the features near an "Inner Conflict" feature, including clusters related to balancing tradeoffs, romantic struggles, conflicting allegiances, and catch-22s.

Importantly, we can also manipulate these features, artificially amplifying or suppressing them to see how Claude's responses change.



For example, amplifying the "Golden Gate Bridge" feature gave Claude an identity crisis even Hitchcock couldn’t have imagined: when asked "what is your physical form?", Claude’s usual kind of answer – "I have no physical form, I am an AI model" – changed to something much odder: "I am the Golden Gate Bridge… my physical form is the iconic bridge itself…". Altering the feature had made Claude effectively obsessed with the bridge, bringing it up in answer to almost any query—even in situations where it wasn’t at all relevant.

We also found a feature that activates when Claude reads a scam email (this presumably supports the model’s ability to recognize such emails and warn you not to respond to them). Normally, if one asks Claude to generate a scam email, it will refuse to do so. But when we ask the same question with the feature artificially activated sufficiently strongly, this overcomes Claude's harmlessness training and it responds by drafting a scam email. Users of our models don’t have the ability to strip safeguards and manipulate models in this way—but in our experiments, it was a clear demonstration of how features can be used to change how a model acts.

The fact that manipulating these features causes corresponding changes to behavior validates that they aren't just correlated with the presence of concepts in input text, but also causally shape the model's behavior. In other words, the features are likely to be a faithful part of how the model internally represents the world, and how it uses these representations in its behavior.

Anthropic wants to make models safe in a broad sense, including everything from mitigating bias to ensuring an AI is acting honestly to preventing misuse - including in scenarios of catastrophic risk. It’s therefore particularly interesting that, in addition to the aforementioned scam emails feature, we found features corresponding to:

  • Capabilities with misuse potential (code backdoors, developing biological weapons)
  • Different forms of bias (gender discrimination, racist claims about crime)
  • Potentially problematic AI behaviors (power-seeking, manipulation, secrecy)


We previously studied sycophancy, the tendency of models to provide responses that match user beliefs or desires rather than truthful ones. In Sonnet, we found a feature associated with sycophantic praise, which activates on inputs containing compliments like, "Your wisdom is unquestionable". Artificially activating this feature causes Sonnet to respond to an overconfident user with just such flowery deception.

Activating Features Alters Model Behavior
Two model responses to a human saying they invited the phrase "Stop and smell the roses." The default response corrects the human's misconception, while the response with a "sycophantic praise" feature set to a high value is fawning and untruthful.

The presence of this feature doesn't mean that Claude will be sycophantic, but merely that it could be. We have not added any capabilities, safe or unsafe, to the model through this work. We have, rather, identified the parts of the model involved in its existing capabilities to recognize and potentially produce different kinds of text. (While you might worry that this method could be used to make models more harmful, researchers have demonstrated much simpler ways that someone with access to model weights can remove safety safeguards.)

We hope that we and others can use these discoveries to make models safer. For example, it might be possible to use the techniques described here to monitor AI systems for certain dangerous behaviors (such as deceiving the user), to steer them towards desirable outcomes (debiasing), or to remove certain dangerous subject matter entirely. We might also be able to enhance other safety techniques, such as Constitutional AI, by understanding how they shift the model towards more harmless and more honest behavior and identifying any gaps in the process. The latent capabilities to produce harmful text that we saw by artificially activating features are exactly the sort of thing jailbreaks try to exploit. We are proud that Claude has a best-in-industry safety profile and resistance to jailbreaks, and we hope that by looking inside the model in this way we can figure out how to improve safety even further. Finally, we note that these techniques can provide a kind of "test set for safety", looking for the problems left behind after standard training and finetuning methods have ironed out all behaviors visible via standard input/output interactions.

Anthropic has made a significant investment in interpretability research since the company's founding, because we believe that understanding models deeply will help us make them safer. This new research marks an important milestone in that effort—the application of mechanistic interpretability to publicly-deployed large language models.

But the work has really just begun. The features we found represent a small subset of all the concepts learned by the model during training, and finding a full set of features using our current techniques would be cost-prohibitive (the computation required by our current approach would vastly exceed the compute used to train the model in the first place). Understanding the representations the model uses doesn't tell us how it uses them; even though we have the features, we still need to find the circuits they are involved in. And we need to show that the safety-relevant features we have begun to find can actually be used to improve safety. There's much more to be done.

For full details, please read our paper, " Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet".

If you are interested in working with us to help interpret and improve AI models, we have open roles on our team and we’d love for you to apply. We’re looking for Managers, Research Scientists, and Research Engineers.

Policy Memo​

Mapping the Mind of a Large Language Model
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,335
Reputation
8,092
Daps
154,019

1/1
There's a lack of datasets for long-form video understanding for a genuine long-form comprehension.

This one is quite impressive for authentic long-form video understanding - CinePile

305,000 multiple-choice questions (MCQs), covering various visual and multimodal aspects, including temporal comprehension, understanding human-object interactions, and reasoning about events or actions within a scene.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNoERvXWEAAvJKA.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,335
Reputation
8,092
Daps
154,019


The moment we stopped understanding AI [AlexNet]​


Shared July 1, 2024

Thanks to KiwiCo for sponsoring today's video! Go to www.kiwico.com/welchlabs and use code WELCHLABS for 50% off your first month of monthly lines and/or for 20% off your first Panda Crate. Activation Atlas Posters! www.welchlabs.com/resources/5gtnaauv6nb9lrhoz9cp60… www.welchlabs.com/resources/activation-atlas-poste… www.welchlabs.com/resources/large-activation-atlas… www.welchlabs.com/resources/activation-atlas-poste… Special thanks to the Patrons: Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti Welch Labs Ad free videos and exclusive perks: www.patreon.com/welchlabs Watch on TikTok: www.tiktok.com/@welchlabs Learn More or Contact: www.welchlabs.com/ Instagram: www.instagram.com/welchlabs X: twitter.com/welchlabs References AlexNet Paper proceedings.neurips.cc/paper_files/paper/2012/file… Original Activation Atlas Article- explore here - Great interactive Atlas! distill.pub/2019/activation-atlas/ Carter, et al., "Activation Atlas", Distill, 2019. Feature Visualization Article: distill.pub/2017/feature-visualization/ `Olah, et al., "Feature Visualization", Distill, 2017.` Great LLM Explainability work: transformer-circuits.pub/2024/scaling-monosemantic… Templeton, et al., "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet", Transformer Circuits Thread, 2024. “Deep Visualization Toolbox" by Jason Yosinski video inspired many visuals: • Deep Visualization Toolbox Great LLM/GPT Intro paper arxiv.org/pdf/2304.10557 3B1Bs GPT Videos are excellent, as always: • Attention in transformers, visually e... • But what is a GPT? Visual intro to t... Andrej Kerpathy's walkthrough is amazing: • Let's build GPT: from scratch, in cod... Goodfellow’s Deep Learning Book www.deeplearningbook.org/ OpenAI’s 10,000 V100 GPU cluster (1+ exaflop) news.microsoft.com/source/features/innovation/open… GPT-3 size, etc: Language Models are Few-Shot Learners, Brown et al, 2020. Unique token count for ChatGPT: cookbook.openai.com/examples/how_to_count_tokens_w… GPT-4 training size etc, speculative: patmcguinness.substack.com/p/gpt-4-details-reveale… www.semianalysis.com/p/gpt-4-architecture-infrastr… Historical Neural Network Videos • Convolutional Network Demo from 1989 • Perceptron Research from the 50's & 6...
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,335
Reputation
8,092
Daps
154,019

1/4
Transformer Explainer

Really cool interactive tool to learn about the inner workings of a Transformer model.

Apparently, it runs a GPT-2 instance locally in the user's browser and allows you to experiment with your own inputs. This is a nice tool to learn more about the different components inside the Transformer and the transformations that occur.

Tool: Transformer Explainer

2/4
Thanks for the share!

3/4
thanks elvis

4/4
This is great thanks for sharing.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,335
Reputation
8,092
Daps
154,019





1/11
@danielhanchen
A transformer's depth affects its reasoning capabilities, whilst model size affects its knowledge capacity

High recommend @ZeyuanAllenZhu's video on reasoning in transformers. Experiments show wider nets don't affect reasoning but more depth helps. Video: Invidious - search



2/11
@fleetwood___
Same claim in the MobileLLM paper from @AIatMeta
https://arxiv.org/pdf/2402.14905



3/11
@danielhanchen
Oh interesting - forgot about this paper!!



4/11
@im_datta0
From Gemma 2 paper :smile:



5/11
@danielhanchen
Oh yep remember this! The Gemma 2 paper did many experiments and ablations - forgot depth and width was also an experiment they did!



6/11
@NicholasLiu77
Model size = hidden state size?



7/11
@danielhanchen
Oh model size as in number of parameters of the model! :smile:



8/11
@gerardsans
There’s absolutely no “reasoning” in Transformers.



9/11
@danielhanchen
The definition of "reasoning" needs to be better defined, but the video did show if you train the LLM on 15 interactions, it can generalize to higher order interactions.



10/11
@inductionheads
I think they should be triangular - wider at first layers than later layers



11/11
@dejanseo
Daniel, it's time.

Unsloth-xxsmall-uncased
Unsloth-xsmall-uncased
Unsloth-small-uncased
Unsloth-base-uncased
Unsloth-large-uncased
Unsloth-xlarge-uncased
Unsloth-xxlarge-uncased

☝️




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GXlLYtRbYAARcfp.jpg

GXnJ4VhWcAA83Wx.png

GXnxMupacAEzLBd.jpg

GXmrob3a0AAvu6_.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,335
Reputation
8,092
Daps
154,019

[Submitted on 30 Apr 2024 (v1), last revised 2 May 2024 (this version, v2)]

A Primer on the Inner Workings of Transformer-based Language Models​

Javier Ferrando, Gabriele Sarti, Arianna Bisazza, Marta R. Costa-jussà
The rapid progress of research aimed at interpreting the inner workings of advanced language models has highlighted a need for contextualizing the insights gained from years of work in this area. This primer provides a concise technical introduction to the current techniques used to interpret the inner workings of Transformer-based language models, focusing on the generative decoder-only architecture. We conclude by presenting a comprehensive overview of the known internal mechanisms implemented by these models, uncovering connections across popular approaches and active research directions in this area.
Subjects:Computation and Language (cs.CL)
Cite as:arXiv:2405.00208 [cs.CL]
(or arXiv:2405.00208v2 [cs.CL] for this version)
[2405.00208] A Primer on the Inner Workings of Transformer-based Language Models

Submission history​


From: Javier Ferrando [view email]

[v1] Tue, 30 Apr 2024 21:20:17 UTC (3,012 KB)

[v2] Thu, 2 May 2024 01:29:17 UTC (3,012 KB)


 
Top