Artificial General Intelligence Is Already Here

bnew

Veteran
Joined
Nov 1, 2015
Messages
45,092
Reputation
7,423
Daps
136,144

Artificial General Intelligence Is Already Here​

Today’s most advanced AI models have many flaws, but decades from now, they will be recognized as the first true examples of artificial general intelligence.

Cecilia Erlich for Noema Magazine
ESSAYTECHNOLOGY & THE HUMAN

BY BLAISE AGÜERA Y ARCAS AND PETER NORVIGOCTOBER 10, 2023


Blaise Agüera y Arcas is a vice president and fellow at Google Research, where he leads an organization working on basic research, product development and infrastructure for AI.

Peter Norvig is a computer scientist and Distinguished Education Fellow at the Stanford Institute for Human-Centered AI.

Artificial General Intelligence (AGI) means many different things to different people, but the most important parts of it have already been achieved by the current generation of advanced AI large language models such as ChatGPT, Bard, LLaMA and Claude. These “frontier models” have many flaws: They hallucinate scholarly citations and court cases, perpetuate biases from their training data and make simple arithmetic mistakes. Fixing every flaw (including those often exhibited by humans) would involve building an artificial superintelligence, which is a whole other project.

Nevertheless, today’s frontier models perform competently even on novel tasks they were not trained for, crossing a threshold that previous generations of AI and supervised deep learning systems never managed. Decades from now, they will be recognized as the first true examples of AGI, just as the 1945 ENIAC is now recognized as the first true general-purpose electronic computer.

The ENIAC could be programmed with sequential, looping and conditional instructions, giving it a general-purpose applicability that its predecessors, such as the Differential Analyzer, lacked. Today’s computers far exceed ENIAC’s speed, memory, reliability and ease of use, and in the same way, tomorrow’s frontier AI will improve on today’s.

But the key property of generality? It has already been achieved.

What Is General Intelligence?​

Early AI systems exhibited artificial narrow intelligence, concentrating on a single task and sometimes performing it at near or above human level. MYCIN, a program developed by Ted Shortliffe at Stanford in the 1970s, only diagnosed and recommended treatment for bacterial infections. SYSTRAN only did machine translation. IBM’s Deep Blue only played chess.

Later deep neural network models trained with supervised learning such as AlexNet and AlphaGo successfully took on a number of tasks in machine perception and judgment that had long eluded earlier heuristic, rule-based or knowledge-based systems.

Most recently, we have seen frontier models that can perform a wide variety of tasks without being explicitly trained on each one. These models have achieved artificial general intelligence in five important ways:
  1. Topics: Frontier models are trained on hundreds of gigabytes of text from a wide variety of internet sources, covering any topic that has been written about online. Some are also trained on large and varied collections of audio, video and other media.
  2. Tasks: These models can perform a variety of tasks, including answering questions, generating stories, summarizing, transcribing speech, translating language, explaining, making decisions, doing customer support, calling out to other services to take actions, and combining words and images.
  3. Modalities: The most popular models operate on images and text, but some systems also process audio and video, and some are connected to robotic sensors and actuators. By using modality-specific tokenizers or processing raw data streams, frontier models can, in principle, handle any known sensory or motor modality.
  4. Languages: English is over-represented in the training data of most systems, but large models can converse in dozens of languages and translate between them, even for language pairs that have no example translations in the training data. If code is included in the training data, increasingly effective “translation” between natural languages and computer languages is even supported (i.e., general programming and reverse engineering).
  5. Instructability: These models are capable of “in-context learning,” where they learn from a prompt rather than from the training data. In “few-shot learning,” a new task is demonstrated with several example input/output pairs, and the system then gives outputs for novel inputs. In “zero-shot learning,” a novel task is described but no examples are given (for instance, “Write a poem about cats in the style of Hemingway” or “’Equiantonyms’ are pairs of words that are opposite of each other and have the same number of letters. What are some ‘equiantonyms’?”).
“The most important parts of AGI have already been achieved by the current generation of advanced AI large language models.”

“General intelligence” must be thought of in terms of a multidimensional scorecard, not a single yes/no proposition. Nonetheless, there is a meaningful discontinuity between narrow and general intelligence: Narrowly intelligent systems typically perform a single or predetermined set of tasks, for which they are explicitly trained. Even multitask learning yields only narrow intelligence because the models still operate within the confines of tasks envisioned by the engineers. Indeed, much of the hard engineering work involved in developing narrow AI amounts to curating and labeling task-specific datasets.

By contrast, frontier language models can perform competently at pretty much any information task that can be done by humans, can be posed and answered using natural language, and has quantifiable performance.

The ability to do in-context learning is an especially meaningful meta-task for general AI. In-context learning extends the range of tasks from anything observed in the training corpus to anything that can be described, which is a big upgrade. A general AI model can perform tasks the designers never envisioned.

So: Why the reluctance to acknowledge AGI?

Frontier models have achieved a significant level of general intelligence, according to the everyday meanings of those two words. And yet most commenters have been reluctant to say so for, it seems to us, four main reasons:
  1. A healthy skepticism about metrics for AGI
  2. An ideological commitment to alternative AI theories or techniques
  3. A devotion to human (or biological) exceptionalism
  4. A concern about the economic implications of AGI

Metrics

There is a great deal of disagreement on where the threshold to AGI lies. Some people try to avoid the term altogether; Mustafa Suleyman has suggested a switch to “Artificial Capable Intelligence,” which he proposes be measured by a “modern Turing Test”: the ability to quickly make a million dollars online (from an initial $100,000 investment). AI systems able to directly generate wealth will certainly have an effect on the world, though equating “capable” with “capitalist” seems dubious.

There is good reason to be skeptical of some of the metrics. When a human passes a well-constructed law, business or medical exam, we assume the human is not only competent at the specific questions on the exam, but also at a range of related questions and tasks — not to mention the broad competencies that humans possess in general. But when a frontier model is trained to pass such an exam, the training is often narrowly tuned to the exact types of questions on the test. Today’s frontier models are of course not fully qualified to be lawyers or doctors, even though they can pass those qualifying exams. As Goodhart’s law states: “When a measure becomes a target, it ceases to be a good measure.” Better tests are needed, and there is much ongoing work, such as Stanford’s test suite HELM (Holistic Evaluation of Language Models).

It is also important not to confuse linguistic fluency with intelligence. Previous generations of chatbots such as Mitsuku (now known as Kuki) could occasionally fool human judges by abruptly changing the subject and echoing a coherent passage of text. Current frontier models generate responses on the fly rather than relying on canned text, and they are better at sticking to the subject. But they still benefit from a human’s natural assumption that a fluent, grammatical response most likely comes from an intelligent entity. We call this the “Chauncey Gardiner effect,” after the hero in “Being There” — Chauncey is taken very seriously solely because he looks like someone who should be taken seriously.

The researchers Rylan Schaeffer, Brando Miranda and Sanmi Koyejo have pointed out another issue with common AI performance metrics: They are nonlinear. Consider a test consisting of a series of arithmetic problems with five-digit numbers. Small models will answer all these problems wrong, but as the size of the model is scaled up, there will be a critical threshold after which the model will get most of the problems right. This has led commenters to say that arithmetic skill is an emergent property in frontier models of sufficient size. But if instead the test included arithmetic problems with one- to four-digit numbers as well, and if partial credit were given for getting some of the digits correct, then we would see that performance increases gradually as the model size increases; there is no sharp threshold.

This finding casts doubt on the idea that super-intelligent abilities and properties, possibly including consciousness, could suddenly and mysteriously “emerge,” a fear among some citizens and policymakers. (Sometimes, the same narrative is used to “explain” why humans are intelligent while the other great apes are supposedly not; in reality, this discontinuity may be equally illusory.) Better metrics reveal that general intelligence is continuous: “More is more,” as opposed to “more is different.”

“Frontier language models can perform competently at pretty much any information task that can be done by humans, can be posed and answered using natural language, and has quantifiable performance.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
45,092
Reputation
7,423
Daps
136,144

Alternative Theories

The prehistory of AGI includes many competing theories of intelligence, some of which succeeded in narrower domains. Computer science itself, which is based on programming languages with precisely defined formal grammars, was in the beginning closely allied with “Good Old-Fashioned AI” (GOFAI). The GOFAI credo, drawing from a line going back at least to Gottfried Wilhelm Leibniz, the 17th-century German mathematician, is exemplified by Allen Newell and Herbert Simon’s “physical symbol system hypothesis,” which holds that intelligence can be expressed in terms of a calculus wherein symbols represent ideas and thinking consists of symbol manipulation according to the rules of logic.

At first, natural languages like English appear to be such systems, with symbols like the words “chair” and “red” representing ideas like “chair-ness” and “red-ness.” Symbolic systems allow statements to be made — “The chair is red” — and logical inferences to follow: “If the chair is red then the chair is not blue.”

While this seems reasonable, systems built with this approach were always brittle and limited in the capabilities and generality they could achieve. There are two main problems: First, terms like “blue,” “red” and “chair” are only approximately defined, and the implications of these ambiguities become more serious as the complexity of the tasks being performed with them grows.

Second, there are very few logical inferences that are universally valid; a chair may be blue and red. More fundamentally, a great deal of thinking is not reducible to the manipulation of logical propositions. That’s why, for decades, concerted efforts to bring together computer programming and linguistics failed to produce anything resembling AGI.

However, some researchers with ideological commitments to symbolic systems or linguistics have continued to insist that their particular theory is a requirement for general intelligence, and that neural nets or, more broadly, machine learning, are theoretically incapable of general intelligence — especially if they are trained purely on language. These critics have been increasingly vocal in the wake of ChatGPT.
“For decades, concerted efforts to bring together computer programming and linguistics failed to produce anything resembling AGI.”


For example, Noam Chomsky, widely regarded as the father of modern linguistics, wrote of large language models: “We know from the science of linguistics and the philosophy of knowledge that they differ profoundly from how humans reason and use language. These differences place significant limitations on what these programs can do, encoding them with ineradicable defects.”

Gary Marcus, a cognitive scientist and critic of contemporary AI, says that frontier models “are learning how to sound and seem human. But they have no actual idea what they are saying or doing.” Marcus allows that neural networks may be part of a solution to AGI, but believes that “to build a robust, knowledge-driven approach to AI, we must have the machinery of symbol manipulation in our toolkit.” Marcus (and many others) have focused on finding gaps in the capabilities of frontier models, especially large language models, and often claim that they reflect fundamental flaws in the approach.
Read Noema in print.


Without explicit symbols, according to these critics, a merely learned, “statistical” approach cannot produce true understanding. Relatedly, they claim that without symbolic concepts, no logical reasoning can occur, and that “real” intelligence requires such reasoning.

Setting aside the question of whether intelligence is always reliant on symbols and logic, there are reasons to question this claim about the inadequacy of neural nets and machine learning, because neural nets are so powerful at doing anything a computer can do. For example:
  • Discrete or symbolic representations can readily be learned by neural networks and emerge naturally during training.
  • Advanced neural net models can apply sophisticated statistical techniques to data, allowing them to make near-optimal predictions from the given data. The models learn how to apply these techniques and to choose the best technique for a given problem, without being explicitly told.
  • Stacking several neural nets together in the right way yields a model that can perform the same calculations as any given computer program.
  • Given example inputs and outputs of any function that can be computed by any computer, a neural net can learn to approximate that function. (Here “approximate” means that, in theory, the neural net can exceed any level of accuracy — 99.9% correct for example — that you care to state.)

For each criticism, we should ask whether it is prescriptive or empirical. A prescriptive criticism would argue: “In order to be considered as AGI, a system not only has to pass this test, it also has to be constructed in this way.” We would push back against prescriptive criticisms on the grounds that the test itself should be sufficient — and if it is not, the test should be amended.

An empirical criticism, on the other hand, would argue: “I don’t think you can make AI work that way — I think it would be better to do it another way.” Such criticism can help set research directions, but the proof is in the pudding. If a system can pass a well-constructed test, it automatically defeats the criticism.

In recent years, a great many tests have been devised for cognitive tasks associated with “intelligence,” “knowledge,” “common sense” and “reasoning.” These include novel questions that can’t be answered through memorization of training data but require generalization — the same proof of understanding we require of students when we test their understanding or reasoning using questions they haven’t encountered during study. Sophisticated tests can introduce novel concepts or tasks, probing a test-taker’s cognitive flexibility: the ability to learn and apply new ideas on the fly. (This is the essence of in-context learning.)

As AI critics work to devise new tests on which current models still perform poorly, they are doing useful work — although given the increasing speed with which newer, larger models are surmounting these hurdles, it might be wise to hold off for a few weeks before (once again) rushing to claim that AI is “hype.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
45,092
Reputation
7,423
Daps
136,144

Human (Or Biological) Exceptionalism

Insofar as skeptics remain unmoved by metrics, they may be unwilling to accept any empirical evidence of AGI. Such reluctance can be driven by a desire to maintain something special about the human spirit, just as humanity has been reluctant to accept that the Earth is not the center of the universe and that Homo sapiens are not the pinnacle of a “great chain of being.” It’s true that there is something special about humanity, and we should celebrate that, but we should not conflate it with general intelligence.

It is sometimes argued that anything that could count as an AGI must be conscious, have agency, experience subjective perceptions or feel feelings. One line of reasoning goes like this: A simple tool, such as a screwdriver, clearly has a purpose (to drive screws), but it cannot be said to have agency of its own; rather, any agency clearly belongs to either the toolmaker or tool user. The screwdriver itself is “just a tool.” The same reasoning applies to an AI system trained to perform a specific task, such as optical character recognition or speech synthesis.

A system with artificial general intelligence, though, is harder to classify as a mere tool. The skills of a frontier model exceed those imagined by its programmers or users. Furthermore, since LLMs can be prompted to perform arbitrary tasks using language, can generate new prompts with language and indeed can prompt themselves (“chain of thought prompting”) the issue of whether and when a frontier model has “agency” requires more careful consideration.

Consider the many actions Suleyman’s “artificial capable intelligence” might carry out in order to make a million dollars online:

It might research the web to look at what’s trending, finding what’s hot and what’s not on Amazon Marketplace; generate a range of images and blueprints of possible products; send them to a drop-ship manufacturer it found on Alibaba; email back and forth to refine the requirements and agree on the contract; design a seller’s listing; and continually update marketing materials and product designs based on buyer feedback.

As Suleyman notes, frontier models are already capable of doing all of these things in principle, and models that can reliably plan and carry out the whole operation are likely imminent. Such an AI no longer seems much like a screwdriver.
“It’s true that there is something special about humanity, and we should celebrate that, but we should not conflate it with general intelligence.”


Now that there are systems that can perform arbitrary general intelligence tasks, the claim that exhibiting agency amounts to being conscious seems problematic — it would mean that either frontier models are conscious or that agency doesn’t necessarily entail consciousness after all.

We have no idea how to measure, verify or falsify the presence of consciousness in an intelligent system. We could just ask it, but we may or may not believe its response. In fact, “just asking” appears to be something of a Rorschach test: Believers in AI sentience will accept a positive response, while nonbelievers will claim that any affirmative response is either mere “parroting” or that current AI systems are “philosophical zombies,” capable of behaving like us but lacking any phenomenal consciousness or experience “on the inside.” Worse, the Rorschach test applies to LLMs themselves: They may answer either way depending on how they are tuned or prompted. (ChatGPT and Bard are both trained to respond that they are not conscious.)

Hinging as it does on unverifiable beliefs (both human and AI), the consciousness or sentience debate isn’t currently resolvable. Some researchers have proposed measures of consciousness, but these are either based on unfalsifiable theories or rely on correlates specific to our own brains, and are thus either prescriptive or can’t assess consciousness in a system that doesn’t share our biological inheritance.

To claim a priori that nonbiological systems simply can’t be intelligent or conscious (because they are “just algorithms,” for example) seems arbitrary, rooted in untestable spiritual beliefs. Similarly, the idea that feeling pain (for example) requires nociceptors may allow us to hazard informed guesses about the experience of pain among our close biological relatives, but it’s not clear how such an idea could be applied to other neural architectures or kinds of intelligence.
“What is it like to be a bat?” Thomas Nagel famously wondered in 1974. We don’t know, and don’t know if we could know, what being a bat is like — or what being an AI is like. But we do have a growing wealth of tests assessing many dimensions of intelligence.

While the quest to seek more general and rigorous characterizations of consciousness or sentience may be worthwhile, no such characterization would alter measured competence at any task. It isn’t clear, then, how such concerns could meaningfully figure into a definition of AGI.

It would be wiser to separate “intelligence” from “consciousness” and “sentience.”

Economic Implications

Arguments about intelligence and agency readily shade into questions about rights, status, power and class relations — in short, political economy. Since the Industrial Revolution, tasks deemed “rote” or “repetitive” have often been performed by low-paid workers, while programming — in the beginning considered “women’s work” — rose in intellectual and financial status only when it became male-dominated in the 1970s. Yet ironically, while playing chess and solving problems in integral calculus turn out to be easy even for GOFAI, manual labor remains a major challenge even for today’s most sophisticated AIs.

What would the public reaction have been had AGI somehow been achieved “on schedule,” when a group of researchers convened at Dartmouth over the summer of 1956 to figure out “how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves”? At the time, most Americans were optimistic about technological progress. The “Great Compression” was underway, an era in which the economic gains achieved by rapidly advancing technology were redistributed broadly (albeit certainly not equitably, especially with regard to race and gender). Despite the looming threat of the Cold War, for the majority of people, the future looked brighter than the past.

Today, that redistributive pump has been thrown into reverse: The poor are getting poorer and the rich are getting richer (especially in the Global North). When AI is characterized as “neither artificial nor intelligent,” but merely a repackaging of human intelligence, it is hard not to read this critique through the lens of economic threat and insecurity.

In conflating debates about what AGI should be with what it is, we violate David Hume’s injunction to do our best to separate “is” from “ought” questions. This is unfortunate, as the much-needed “ought” debates are best carried out honestly.

AGI promises to generate great value in the years ahead, yet it also poses significant risks. The natural questions we should be asking in 2023 include: “Who benefits?” “Who is harmed?” “How can we maximize benefits and minimize harms?” and “How can we do this fairly and equitably?” These are pressing questions that should be discussed directly instead of denying the reality of AGI.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
45,092
Reputation
7,423
Daps
136,144

AI scores in the top percentile of creative thinking​

by Erik Guzik
December 7, 2023
in Artificial Intelligence




(Photo credit: Adobe Stock)

(Photo credit: Adobe Stock)

Of all the forms of human intellect that one might expect artificial intelligence to emulate, few people would likely place creativity at the top of their list. Creativity is wonderfully mysterious – and frustratingly fleeting. It defines us as human beings – and seemingly defies the cold logic that lies behind the silicon curtain of machines.

Yet, the use of AI for creative endeavors is now growing.

New AI tools like DALL-E and Midjourney are increasingly part of creative production, and some have started to win awards for their creative output. The growing impact is both social and economic – as just one example, the potential of AI to generate new, creative content is a defining flashpoint behind the Hollywood writers strike.

And if our recent study into the striking originality of AI is any indication, the emergence of AI-based creativity – along with examples of both its promise and peril – is likely just beginning.



A blend of novelty and utility​


When people are at their most creative, they’re responding to a need, goal or problem by generating something new – a product or solution that didn’t previously exist.

In this sense, creativity is an act of combining existing resources – ideas, materials, knowledge – in a novel way that’s useful or gratifying. Quite often, the result of creative thinking is also surprising, leading to something that the creator did not – and perhaps could not – foresee.

It might involve an invention, an unexpected punchline to a joke or a groundbreaking theory in physics. It might be a unique arrangement of notes, tempo, sounds and lyrics that results in a new song.

So, as a researcher of creative thinking, I immediately noticed something interesting about the content generated by the latest versions of AI, including GPT-4.

When prompted with tasks requiring creative thinking, the novelty and usefulness of GPT-4’s output reminded me of the creative types of ideas submitted by students and colleagues I had worked with as a teacher and entrepreneur.

The ideas were different and surprising, yet relevant and useful. And, when required, quite imaginative.

Consider the following prompt offered to GPT-4: “Suppose all children became giants for one day out of the week. What would happen?” The ideas generated by GPT-4 touched on culture, economics, psychology, politics, interpersonal communication, transportation, recreation and much more – many surprising and unique in terms of the novel connections generated.

This combination of novelty and utility is difficult to pull off, as most scientists, artists, writers, musicians, poets, chefs, founders, engineers and academics can attest.

Yet AI seemed to be doing it – and doing it well.



Putting AI to the test​


With researchers in creativity and entrepreneurship Christian Byrge and Christian Gilde, I decided to put AI’s creative abilities to the test by having it take the Torrance Tests of Creative Thinking, or TTCT.

The TTCT prompts the test-taker to engage in the kinds of creativity required for real-life tasks: asking questions, how to be more resourceful or efficient, guessing cause and effect or improving a product. It might ask a test-taker to suggest ways to improve a children’s toy or imagine the consequences of a hypothetical situation, as the above example demonstrates.

The tests are not designed to measure historical creativity, which is what some researchers use to describe the transformative brilliance of figures like Mozart and Einstein. Rather, it assesses the general creative abilities of individuals, often referred to as psychological or personal creativity.

In addition to running the TTCT through GPT-4 eight times, we also administered the test to 24 of our undergraduate students.

All of the results were evaluated by trained reviewers at Scholastic Testing Service, a private testing company that provides scoring for the TTCT. They didn’t know in advance that some of the tests they’d be scoring had been completed by AI.

Since Scholastic Testing Service is a private company, it does not share its prompts with the public. This ensured that GPT-4 would not have been able to scrape the internet for past prompts and their responses. In addition, the company has a database of thousands of tests completed by college students and adults, providing a large, additional control group with which to compare AI scores.

Our results?

GPT-4 scored in the top 1% of test-takers for the originality of its ideas. From our research, we believe this marks one of the first examples of AI meeting or exceeding the human ability for original thinking.

In short, we believe that AI models like GPT-4 are capable of producing ideas that people see as unexpected, novel and unique. Other researchers are arriving at similar conclusions in their research of AI and creativity.



Yes, creativity can be evaluated​


The emerging creative ability of AI is surprising for a number of reasons.

For one, many outside of the research community continue to believe that creativity cannot be defined, let alone scored. Yet products of human novelty and ingenuity have been prized – and bought and sold – for thousands of years. And creative work has been defined and scored in fields like psychology since at least the 1950s.

The person, product, process, press model of creativity, which researcher Mel Rhodes introduced in 1961, was an attempt to categorize the myriad ways in which creativity had been understood and evaluated until that point. Since then, the understanding of creativity has only grown.

Still others are surprised that the term “creativity” might be applied to nonhuman entities like computers. On this point, we tend to agree with cognitive scientist Margaret Boden, who has argued that the question of whether the term creativity should be applied to AI is a philosophical rather than scientific question.



AI’s founders foresaw its creative abilities​


It’s worth noting that we studied only the output of AI in our research. We didn’t study its creative process, which is likely very different from human thinking processes, or the environment in which the ideas were generated. And had we defined creativity as requiring a human person, then we would have had to conclude, by definition, that AI cannot possibly be creative.

But regardless of the debate over definitions of creativity and the creative process, the products generated by the latest versions of AI are novel and useful. We believe this satisfies the definition of creativity that is now dominant in the fields of psychology and science.

Furthermore, the creative abilities of AI’s current iterations are not entirely unexpected.

In their now famous proposal for the 1956 Dartmouth Summer Research Project on Artificial Intelligence, the founders of AI highlighted their desire to simulate “every aspect of learning or any other feature of intelligence” – including creativity.

In this same proposal, computer scientist Nathaniel Rochester revealed his motivation: “How can I make a machine which will exhibit originality in its solution of problems?”

Apparently, AI’s founders believed that creativity, including the originality of ideas, was among the specific forms of human intelligence that machines could emulate.

To me, the surprising creativity scores of GPT-4 and other AI models highlight a more pressing concern: Within U.S. schools, very few official programs and curricula have been implemented to date that specifically target human creativity and cultivate its development.

In this sense, the creative abilities now realized by AI may provide a “Sputnik moment” for educators and others interested in furthering human creative abilities, including those who see creativity as an essential condition of individual, social and economic growth.



This article is republished from The Conversation under a Creative Commons license. Read the original article.
 
Top