No technology remains dominant forever.
www.forbes.com
FORBESINNOVATIONAI
Transformers Revolutionized AI. What Will Replace Them?
Rob Toews
Contributor
I write about the big picture of artificial intelligence.
https://www.forbes.com/sites/robtoe...will-replace-them/?sh=4ec04d959c1f#open-web-0
Sep 3, 2023,06:00pm EDT
https://policies.google.com/privacy
Listen to article21 minutes
The transformer, today's dominant AI architecture, has interesting parallels to the alien language ... [+]
PARAMOUNT PICTURES
If modern artificial intelligence has a founding document, a sacred text, it is Google’s 2017 research paper
“Attention Is All You Need.”
This paper introduced a new deep learning architecture known as the transformer, which has gone on to revolutionize the field of AI over the past half-decade.
The generative AI mania currently taking the world by storm can be traced directly to the invention of the transformer. Every major AI model and product in the headlines today—ChatGPT, GPT-4, Midjourney, Stable Diffusion, GitHub Copilot, and so on—is built using transformers.
Transformers are remarkably general-purpose: while they were initially developed for language translation specifically, they are now advancing the state of the art in domains ranging from computer vision to robotics to computational biology.
In short, transformers represent the undisputed gold standard for AI technology today.
But no technology remains dominant forever.
It may seem surprising or strange, with transformers at the height of their influence, to contemplate what will come next. But in the fast-moving world of AI, it is both fascinating and advantageous to seek to “see around corners” and glimpse what the future holds before it becomes obvious.
Transformers 101
In order to explore this question, we must first understand transformers more deeply.
The now-iconic transformer paper was co-authored by eight researchers working together at Google over the course of 2017: Aidan Gomez, Llion Jones, Lukasz Kaiser, Niki Parmar, Illia Polosukhin, Noam Shazeer, Jakob Uszkoreit and Ashish Vaswani.
An often-overlooked fact about the paper is that all eight authors are listed as equal contributors; the order in which the authors’ names appear on the paper was randomly determined and has no significance. With that said, it is generally recognized that Uszkoreit provided the initial intellectual impetus for the transformer concept, while Vaswani and Shazeer were the two authors most deeply involved in every aspect of the work from beginning to end.
All eight authors have become luminaries in the world of AI thanks to their work on the paper. None of them still work at Google. Collectively, the group has gone on to found many of today’s most important AI startups, including
Cohere,
Character.ai,
Adept,
Inceptive,
Essential AI and
Sakana AI.
Why, exactly, was the transformer such a massive breakthrough?
Before the “Attention Is All You Need” paper was published, the state of the art in language AI was a deep learning architecture known as recurrent neural networks (RNNs).
By definition, RNNs process data sequentially—that is, one word at a time, in the order in which the words appear.
But important relationships often exist between words even if they do not appear next to each other in a sequence. In order to better enable RNNs to account for these long-distance dependencies between words, a mechanism known as attention had recently become popular. (The invention of the attention mechanism is generally attributed to a
2014 paper from deep learning pioneer Yoshua Bengio.)
Attention enables a model to consider the relationships between words regardless of how far apart they are and to determine which words and phrases in a passage are most important to “pay attention to.”
Before the transformer paper, researchers had only used attention as an add-on to the RNN architecture. The Google team’s big leap was to do away with RNNs altogether and rely entirely on attention for language modeling. Hence the paper’s title: Attention Is All You Need.
(A charming, little-known fact about the paper: according to co-author Llion Jones, its title is a nod to the Beatles song “All You Need Is Love.”)
The eight research scientists who created the transformer.
IMAGE CREDIT: FINANCIAL TIMES
Transformers’ fundamental innovation, made possible by the attention mechanism, is to make language processing
parallelized, meaning that all the words in a given body of text are analyzed at the same time rather than in sequence.
As an interesting analogy, co-author Illia Polosukhin
has compared the transformer architecture to the fictional alien language in the 2016 science fiction movie
Arrival. Rather than generating strings of characters sequentially to form words and sentences (the way that humans do), the aliens in the film produce one complex symbol at a time, all at once, which conveys detailed meaning that the humans must interpret as a whole.
Transformers’ parallelization gives them a more global and thus more accurate understanding of the texts that they read and write. It also makes them more computationally efficient and more scalable than RNNs. Transformers can be trained on much larger datasets and built with many more parameters than previous architectures, making them more powerful and generalizable. Indeed, a hallmark of today’s leading transformer-based models is their scale.
In one of those mutually beneficial, mutually reinforcing historical co-occurrences, the transformer’s parallel architecture dovetailed with the rise of GPU hardware. GPUs are a type of computer chip that are themselves massively parallelized and thus ideally suited to support transformer-based computing workloads. (Nvidia, the world’s leading producer of GPUs, has been perhaps the single biggest beneficiary of today’s AI boom, recently surpassing a $1 trillion market capitalization amid staggering demand for its chips.)
The rest, as they say, is history. Thanks to these tremendous advantages, transformers have taken the world by storm in the six years since their invention, ushering in the era of generative AI.
Every popular “chatbot” today—OpenAI’s ChatGPT, Google’s Bard, Microsoft’s Bing Chat, Anthropic’s Claude, Inflection’s Pi—is transformer-based. So is every AI tool that generates images or videos, from Midjourney to Stable Diffusion to Runway. (Text-to-image and text-to-video technology is powered by diffusion models; diffusion models make use of transformers.)
Transformers’ influence reaches well beyond text and images. The most advanced robotics research today relies on transformers. Indeed, Google’s most recent robotics work is actually named
RT-2, where the T stands for “transformer.” Similarly, one of the most promising new avenues of research in the field of autonomous vehicles is the
use of vision transformers. Transformer-based models have unlocked
breathtaking new possibilities in biology, including the ability to design customized proteins and nucleic acids that have never before existed in nature.
Transformer co-inventor Ashish Vaswani
summed it up well: “The transformer is a way to capture interaction very quickly all at once between different parts of any input. It’s a general method that captures interactions between pieces in a sentence, or the notes in music, or pixels in an image, or parts of a protein. It can be purposed for any task.”