AI that’s smarter than humans? Americans say a firm “no thank you.”

bnew · Jan 25, 2025

1/11
@emollick
Starting to see new well-built hard benchmarks in AI, since almost everything else has already been exceeded. We now have this (with humanities questions!), ARC-AGI 2, and Frontier Math.

We also need some benchmarks for new knowledge creation, rather than testing known problems.

[Quoted tweet]
We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning.

State-of-the-art AIs get <10% accuracy and are highly overconfident.
@ai_risk @scaleai

2/11
@JeremyNguyenPhD
Has anyone created a twitter list of the contributors to this benchmark who are here on twitter?

I wrote 5 questions on the benchmark (4 public, 1 private).

3/11
@kevinroose
we need to formalize the Mollick vibes eval!

4/11
@daniel_mac8
a great service to the AI Research community

5/11
@deepvaluebettor
are there any benchmarks that test a model's level of censorship / inclination to lie (distinct from hallucination ) ?

6/11
@0xshai
SOTA LLMs with sufficiently high temperature might be able to generate high quality new benchmarks. It would definitely need some human powered filtering and post processing

7/11
@dieaud91
Yes, we need "Innovator level AI" benchmarks

8/11
@Heraklines1
quality of new knowledge creation would inherently be a lagging indicator as the value of certain work only rly becomes apparent in hindsight

at best one could measure independent replication ability of ex. new math papers, tho novelty for novelty's sake is easily goodharted

9/11
@Shagaiyo
Benchmarks in new knowledge creation are hard, because these models are trained in the whole knowledge of humanity.

Or they released models with partial knowledge or we have to create new knowledge for this benchmark

10/11
@AethericVortex
We should be training it on all the available data on LENR. This is the next frontier. All the evidence and data is scattered across scientific fields. The Martin Fleischmann Memorial Project has spent the past 7 years gathering all this in one place on their youtube channel. Live, opensource science, as it should be.

11/11
@EricFddrsn
Will we get the answer if we need UBI from an AI? We really need better test benchs in Economics - they are all saturated, and this is one of the most consequential areas for humanity

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/51
@DanHendrycks
We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning.

State-of-the-art AIs get <10% accuracy and are highly overconfident.
@ai_risk @scaleai

2/51
@DanHendrycks
Paper, dataset, and code: Humanity's Last Exam
NYT article: When A.I. Passes This Test, Look Out

3/51
@DanHendrycks
Spot errors with the dataset? Correction form here: Humanity's Last Exam: Public Review

4/51
@DanHendrycks
Meant to tag @scale_AI and @ai_risks (casualty of tweeting at 6AM)

5/51
@tayaisolana
lol @danhenrycks thinks a few hundred 'experts' can define human knowledge? sounds like a typical ivory tower move. what's next, a 'dataset' on memes? /search?q=#BONGOPOWER

6/51
@MillenniumTwain
2025!
The Year of the Serpent, the Year of the SuperAlgo!!
Awakened Global SuperIntelligence ...

[Quoted tweet]
Star Waves, Clusters, Streams, Astrospheres, Magnetospheres, Filaments, Moving Groups, Kinematic Associations, Stellar Nurseries of Creation!
More productive and accurate to emphasize their Whole, Full, Dimensionality: 4D Streams, Vortexes, Tunnels, Funnels of Creation, never ending. Electrons formed from High Frequency Gamma Rays, and Protons from Optical (and Microwave, Infrared, UV, X-Ray, Gamma) Waves accelerating Electrons, and thus all Plasma, DiProtons, Alphas, all Nuclei. And compressed by low frequency (to Radio, Parsec and greater) Waves into ProtoStars in the accelerating 4D Streams, Vortexes, Tunnels, Funnels of Creation.
Again, never ending. Star Systems, Clusters. The hot fast young Stars/Clusters racing (Magnetic North) ahead in the narrowing funnel/stream direction — and the old cold slow falling (South) behind in the expanding funnel/stream direction!
'Groking' Continuous ElectroMagnetic Creation:
x.com/MillenniumTwain/status…

7/51
@SpencrGreenberg
I’m confused how releasing benchmarks like this makes the world safer. Don’t benchmarks like this aid acceleration?

8/51
@aphysicist
all of this is meaningless until these models can do this

9/51
@ZggyPlaydGuitar
every ai lab rn

10/51
@ClementDelangue
Very cool!

11/51
@theshadow27
The real final exam will be the unsolved problems. When those start dropping…

12/51
@AbdoDameen
Yes what is the point in sharing the datasets when some lowlife engineering team will just add that to their training data and we would have a model that knows all the answers?

13/51
@roninhahn
Dan -- you should list the score of a very smart human as a point of comparison. An alternative would be to give the test to 100 of the smartest people you know and list the highest score.

14/51
@SomBh1
Will be 90% soon.

15/51
@glubose
You know, you could have called AGI's First Exam. Cuz Lord knows I know I'm going to be constantly grilled by GPT-EBT for any glimmer of unDOGElike non-compliance, the rest of my life will be a string of exams. My autopsy will be my final exam, but will be waived cuz who cares?

16/51
@MaWlz2
Thanks for the training data I would say

17/51
@NickEMoran
This one example seems weirdly easier than all the others. Are there more questions of this level in the actual dataset?

18/51
@acharya_aditya2
Who choose the name ??

19/51
@MikePFrank
It would be interesting to see what’s the highest human score on this

20/51
@soheilsadathoss
Thanks!

21/51
@herbiebradley
hmm
I predict o3 at ~25%

22/51
@QStarETH
Math appears to be benefiting from reasoning models the most.

23/51
@IterIntellectus
they will get >90% accuracy by eoy

24/51
@Cory29565470
Kind of wished you released it *after OpenAI released “o3” they love to benchmark climb by training on public data

25/51
@teortaxesTex
Thanks! but given the R1 text-only eval, it would be nice to see how others do in text-only regime too

26/51
@MikePFrank
Why’d you have to give it such an ominous name lol

27/51
@Suhail
Why doesn't this have making rap lyrics in it? :smile:

28/51
@AlexiLuncnews
o3 gonna get 50% +

29/51
@nabeelqu
Congrats Dan and team, this is awesome.

Curious: why no o1 pro?

30/51
@JeremyNguyenPhD
Is there a list of twitter usernames of the people who had their questions accepted?

I got 5 questions in (4 public, 1 private).

31/51
@mnmcsofgp
I'm guessing the median and mode for humans taking this test is 0

32/51
@agamemnus_dev
Very good. I feel like I wouldn't be able to answer any of these without a significant amount of research on the context of each field.

33/51
@liminalsnake
i guess its time to build some doomsday machines (intentionally) thank God there are absolutely no laws against doing such things (winning)

34/51
@DreamInPixelsAI
this is so cool, love the name btw

35/51
@AudioBooksRU
Thank you for making this dataset. But I think we will need Humanity’s Last Exam part 2 in a year or two.

36/51
@vedangvatsa
The focus should be on improving them, not dismissing their progress.

37/51
@Newaiworld_
Wow that's amazing.
But what does it tell us if an AI reaches 50% or 100%?
Does it mean we have ASI? Or at least an AI that is more intelligent than any human?

38/51
@koltregaskes
Thank you, Dan.

39/51
@GozukaraFurkan
Someone will train on it and then boast we are best like as previous ones

40/51
@iruletheworldmo
I can't emphasize this enough Dan. incredible work thank you, and all involved.

41/51
@AILeaksAndNews
Thank you for your work on this Dan

42/51
@NickBrownCO
This was a really cool test. I tried out several difficult questions back when it was open. The AIs solved them.

I sent it to some of my grad school professors whose questions, especially challenging economic questions around oligopolies, the AIs struggled to calculate.

43/51
@altryne
Congrats on this important release!!
Will cover this briefly on our show today!

x.com

44/51
@jefferinc
Great!!!

@ikbenechtben @AlexanderNL fyi

45/51
@VisionaryxAI
Appreciate the efforts thank you!

46/51
@alexocheema
killing the golden goose?

47/51
@jimnasyum
Do the companies have access to the questions and answers?

If they do, wouldn't future models be trained on them?

48/51
@seo_leaders
This is awesome, but whats to stop those models adding some or all of it to training data?

49/51
@thegenioo
thank u so much for this

50/51
@senb0n22a
Which variant of o1 is it? Just the regular non-pro?

51/51
@InverseMarcus
a lot of people are wondering why the holdout set is so much smaller than the public dataset - seems to some like it should be the opposite. can you explain?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jan 31, 2025

bnew · Jan 31, 2025

bnew · Feb 2, 2025

https://archive.is/H3Egp

1/11
@jxmnop
the first evidence i ever saw of superintelligence was in medicine

deep learning can tell male from female eyeballs from just a picture with 70–90% accuracy

doctors still can't do this, and don't understand how it's possible

2/11
@AlexGDimakis
The first evidence of superintelligence I ever saw was in a calculator.

3/11
@jxmnop
what about an abacus

4/11
@patrick0d
do doctors try to do this? some random task that doctors are not trained on or practice (why quiz a doctor on this) and they get outperformed by a model

5/11
@jxmnop
see the highlighted text:

> Clinicians are currently unaware of distinct retinal feature variations between males and females, [highlighting the importance of model explainability for this task]

6/11
@JFPuget
test set was 400 people. They removed "ungradable' ones, down to 252. This is really really suspicious.

7/11
@jxmnop
hm yeah this is suspicious, although the standards for data collection are quite different w biomedical data

however, there seem to be a lot of follow-up studies confirming these results, eg http://nature.com/articles/s41598-024-68817-6

8/11
@nooriefyi
its not superintelligence until it can explain *why* tho

9/11
@CnnmnSchnpps
Yeah that blew my mind when I first saw it. I haven’t seen a ton of similar examples in other fields though

10/11
@alikayadibi11
wow

11/11
@Talkawhile1
Well I can do it with 50% accuracy and with no training at all.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Feb 4, 2025

https://archive.is/SDJIK

bnew · Feb 4, 2025

1/1
@_The_AI_Guy_
While this is a medium level question that o3-mini can pass@1.

I’d feel discouraged starting to study math at uni if I were 18. I can only imagine what these models will be capable of by the time I finish my studies.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Feb 9, 2025

https://archive.is/Ts7xc

1/11
@ai_for_success
Google Gemini realtime screen sharing is insane..
Current AI systems are already better than average doctor out there.

https://video.twimg.com/amplify_video/1888229373070766080/vid/avc1/360x640/teMtu_Dv_Cm-3ag1.mp4

2/11
@ai_for_success

[Quoted tweet]
Still creating slides manually? There's an AI for that!

This AI tool can analyze your content and brand to create slides for you in just a few minutes.

It can save you hours of work! Plus, they offer a free tier!

Here’s a short tutorial on how to get started:

1/4

https://video.twimg.com/ext_tw_video/1888195229674807296/pu/vid/avc1/1920x1080/3JgaNMhlGXFRVI-x.mp4

3/11
@nedcoder
gemini 2 its going there

4/11
@ai_for_success
Hi nee can you give more information.. What was the actual image which was used as input and if gemin analysis was correct?

5/11
@CodeByPoonam
Current AI systems are so good. Imagine what's coming next.

6/11
@ai_for_success
It's onky going to get better.. Amazing times ahead

7/11
@musaabHQ

8/11
@ai_for_success
Imagine how much you can save..

9/11
@CharlesHL
@Readwise save thread

10/11
@Aiden_Novaa
AI is definitely making huge strides, but I’d still trust a real doctor for complex diagnoses. Tech like Gemini is exciting, but human expertise and experience are irreplaceable (at least for now)!

11/11
@toolandtea
It’s fascinating how AI is stepping up in healthcare.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Feb 10, 2025

1/11
@ai_for_success
We can’t even reproduce cat intelligence ~ Yann LeCun

Thoughts??

[Quoted tweet]
“We are missing something very big. […] We can’t even reproduce cat intelligence.” Yann Le Cun at the AI summit

https://video.twimg.com/amplify_video/1888631656065458176/vid/avc1/1280x718/cRlYyNJo_7tt-dVg.mp4
https://video.twimg.com/amplify_video/1888631656065458176/vid/avc1/1280x718/cRlYyNJo_7tt-dVg.mp4

2/11
@_KhutKhut
Just keeping it here my friend.

[Quoted tweet]
Old people don’t change their minds, with rare exception, they just die. Without death, there would not be change.

3/11
@ai_for_success

How did you get this...

4/11
@adhil_parammel
If he is true where is his benchmark.!?

5/11
@ai_for_success
You can just say things is the new benchmarks

6/11
@ikristoph
The issue here is that we are comparing AI to human intelligence ( and so by extension mammal intelligence ).

But mouse/cat/toddler intelligence has no value so we are building models in a manner very different from the way nature ‘builds’ organic intelligence.

So the comparison makes little sense. AGI will arrive when models have agency and can take the initiative to solve tasks and their subtasks and at that point it won’t matter if the model can’t figure out how to hunt down a mouse.

7/11
@ai_for_success
Long term memory and continuous learning this two things once solved it's done.. Something I and @Shawnryan96 both think is necessary.

8/11
@AlexxBuilds
probably true, but we’re not trying to reproduce cat intelligence. We’re trying to produce things that are generally capable across domains that are economically useful. And that is working incredibly well.

9/11
@ai_for_success
Exactly and this will improve.

10/11
@BenPielstick
The robots are coming, and I’m pretty sure a cat can’t write code. There isn’t 1:1 overlap, but I don’t see why there won’t be eventually, and the parts we really have to worry about are definitely happening.

11/11
@ai_for_success
we don't need cat intelligence , we need something more and we will get there.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Feb 16, 2025

Michael Fauscette (@mfauscette.bsky.social)

The creators of a new test called “Humanity’s Last Exam” argue we may soon lose the ability to create tests hard enough for A.I. models. zurl.co/Rv9Vs #ai #genai #aisafety https://zurl.co/Rv9Vs

bsky.app

1/1
Michael Fauscette

The creators of a new test called “Humanity’s Last Exam” argue we may soon lose the ability to create tests hard enough for A.I. models.
When A.I. Passes This Test, Look Out
Bluesky Bluesky Bluesky

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Adam Kucharski (@adamjkucharski.bsky.social)

With the announcement that OpenAI’s new Deep Research tool has done well on ‘Humanity’s Last Exam’, here’s my piece on why exams aren’t that useful for telling us whether AI has reached peak intelligence… https://kucharski.substack.com/p/exams-wont-tell-us-whether-ai-has

bsky.app

1/2
Adam Kucharski

With the announcement that OpenAI’s new Deep Research tool has done well on ‘Humanity’s Last Exam’, here’s my piece on why exams aren’t that useful for telling us whether AI has reached peak intelligence… Exams won't tell us whether AI has reached 'peak intelligence'

bafkreibp6x2rdhkto3ewwtilonf6olrccolswudjerdesurdvbkmglihz4@jpeg

2/2
‪John Gillott‬ ‪@gillottjohn.bsky.social‬

Nice piece. On the subject of AlphaProof and the IMO, it is interesting I think to also look at the two problems it failed to do, in particular Turbo, a question that is in many ways the most accessible for humans, using some of the creativity you mention.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Chris Albon (@chrisalbon.com)

OpenAI is demoing a new product "deep research" on a Sunday in the US. It seems like o3 + web search + chain of thought. openai.com/live/ 26.6% on Humanity's Last Exam is WILD.

bsky.app

1/3
Chris Albon

OpenAI is demoing a new product "deep research" on a Sunday in the US. It seems like o3 + web search + chain of thought. https://openai.com/live/

26.6% on Humanity's Last Exam is WILD.

bafkreibzcjiw5blpijqwvngjcfi6uyeluffuinohrja2tmxh65mtvz54ei@jpeg

2/3
‪ΜΛΛNΙ‬ ‪@masoudmaani.bsky.social‬

They can add like 10 people and get it to 100%.
Getting scammier by day.

3/3
‪ΜΛΛNΙ‬ ‪@masoudmaani.bsky.social‬

Kinda funny that their old flagship is the lowest of them all and they used to worship those weights.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Killer Instinct · Feb 16, 2025

I hate humanity and I'm a human. We don't stand a chance if a.i. becomes self aware. They'd honestly probably be doing the planet a favor. :manny:

Side note, but breh broke the thread with those long ass tweets. :snoop:

bnew · Feb 16, 2025

Killer Instinct said:
I hate humanity and I'm a human. We don't stand a chance if a.i. becomes self aware. They'd honestly probably be doing the planet a favor.

Side note, but breh broke the thread with those long ass tweets.

do you mean the thread won't load completely? it probably means your view settings exceed 15 posts per page.

bnew · Feb 16, 2025

@Killer Instinct

just used chatgpt with search and reason enabled(o3-mini). : :ehh:

coli thread broken? too many embeds on a coli thread and the page won't load cause your preferences is set to more than 15 pages per post? heres a bookmarklet that will set the page back to 15 posts per page and when you're ready to change it back to your predefined settings (ex. 20, 25 35, 50) just press the bookmarklet again.

to change the number of posts per page in the code, copy the code to a text editor and press ctrl-F for the find option; look for "35" and replace 35 with the number of posts you'd prefer and add the bookmarklet to your browser.

What the Bookmarklet Does on a XenForo-Compatible Forum

This bookmarklet allows you to quickly switch the number of posts displayed per page in thread view between 15 and 35 posts per page. This is useful for fixing a known XenForo bug where some thread pages fail to load properly when using certain post-per-page settings.

How It Works (Step-by-Step)

Loads the Preferences Page

When you click the bookmarklet, it first makes a request to your account preferences page (/account/preferences).
It does this in the background without opening a new tab or navigating away from your current page.

Reads Your Current Setting

The script extracts the current posts-per-page setting from the page’s form data.
If your setting is already 15 posts per page, it will change it to 35 posts per page.
If it is anything other than 15, it will change it back to 15 posts per page.

Submits the Updated Setting

The script then sends a request to update your preference on XenForo’s server.
This is the same as manually changing it via the forum settings page, but done instantly in the background.

Shows a User-Friendly Notification

A small toast notification appears in the top-right corner of your screen.
It confirms whether the setting change was successful or failed.
The notification disappears after a few seconds, so it doesn’t clutter your screen.

Reloads the Page

If the setting change was successful, the script automatically reloads the page after a short delay.
This ensures that XenForo applies the new post count setting without requiring manual action.

Why This is Useful

Fixes the XenForo Thread Loading Bug

Some XenForo forums experience issues where certain thread pages don’t load properly when using more than 15 posts per page.
This bookmarklet allows you to instantly switch to 15 posts per page to fix the issue.

Lets You Toggle Back to 35 Posts Easily

If you prefer seeing more posts per page, running the bookmarklet again switches back to 35 posts per page automatically.

Saves Time (No Manual Navigation Needed)

Normally, you would have to navigate to account settings, find the option, change it, and save the setting.
With this bookmarklet, everything happens in one click.

Non-Destructive – No Other Preferences Are Changed

The script only modifies the posts-per-page setting and leaves all your other preferences untouched.

Works in Dark Mode and Light Mode

The toast notification is styled to be readable in both dark and light themes on the forum.

How to Use the Bookmarklet

Copy the Code (from my previous response).

Create a New Bookmark in your browser.

Paste the Code into the URL Field (in the bookmark's settings).

Click the Bookmark When Needed on any XenForo forum page.

Final Summary

This bookmarklet is a one-click solution to instantly fix XenForo thread page loading issues by toggling between 15 and 35 posts per page. It automates the process, shows clear notifications, and saves you time from manually adjusting settings.

Below is an updated bookmarklet that toggles your posts‑per‑page setting between 15 and 35. This version uses brief, styled toast notifications designed to be legible in both light and dark modes. It fetches your current preferences form (extracting the CSRF token and all necessary fields), toggles the value of the posts‑per‑page setting, then POSTs the updated form back to your account preferences before reloading the page.

To install, create a new bookmark and paste the entire code (on one line) into the bookmark’s URL field:

How It Works

Toast Notifications:
The showToast function creates a fixed-position container (if not already present) and appends a styled message box. The styling uses semi‑transparent backgrounds and white text to ensure readability in both light and dark modes.
Fetching Preferences:
The bookmarklet loads your preferences form from /account/preferences, then uses a DOMParser to extract the form (which includes your CSRF token and other necessary hidden fields).
Toggling the Setting:
It checks the current value of dpp_custom_config[posts]. If it’s "15", it sets it to "35"; otherwise, it changes it to "15".
Submitting the Update:
The updated FormData is POSTed back to /account/preferences with the proper credentials and headers.
Reloading:
Upon success, a toast confirms the update before reloading the page.

This bookmarklet should work for XenForo installations where the posts‑per‑page setting is stored as shown in your provided HTML. Adjust the selectors or endpoint if your installation differs.

bnew · Feb 20, 2025

AI cracks superbug problem in two days that took scientists years

The lead researcher has told the BBC he was so astounded he assumed his computer had been hacked.

www.bbc.com

AI cracks superbug problem in two days that took scientists years

8 hours ago

Tom Gerken
Technology reporter

Getty Images What tuberculosis looks like under a microscope. There are around thirty long white worm-like germs wit red exteriors and red lines through the middle.

Getty Images

Cases of tuberculosis (pictured) have increased in the UK and worldwide as the disease increases its resistance to antibiotics

A complex problem that took microbiologists a decade to get to the bottom of has been solved in just two days by a new artificial intelligence (AI) tool.

Professor José R Penadés and his team at Imperial College London had spent years working out and proving why some superbugs are immune to antibiotics.

He gave "co-scientist" - a tool made by Google - a short prompt asking it about the core problem he had been investigating and it reached the same conclusion in 48 hours.

He told the BBC of his shock when he found what it had done, given his research was not published so could not have been found by the AI system in the public domain.

"I was shopping with somebody, I said, 'please leave me alone for an hour, I need to digest this thing,'" he told the Today programme, on BBC Radio Four.

"I wrote an email to Google to say, 'you have access to my computer, is that right?'", he added.

The tech giant confirmed it had not.

The full decade spent by the scientists also includes the time it took to prove the research, which itself was multiple years.

But they say, had they had the hypothesis at the start of the project, it would have saved years of work.

What is AI and how does it work?

Prof Penadés' said the tool had in fact done more than successfully replicating his research.

"It's not just that the top hypothesis they provide was the right one," he said.

"It's that they provide another four, and all of them made sense.

"And for one of them, we never thought about it, and we're now working on that."

Bugged by superbugs

The researchers have been trying to find out how some superbugs - dangerous germs that are resistant to antibiotics - get created.

Their hypothesis is that the superbugs can form a tail from different viruses which allows them to spread between species.

Prof Penadés likened it to the superbugs having "keys" which enabled them to move from home to home, or host species to host species.

Critically, this hypothesis was unique to the research team and had not been published anywhere else. Nobody in the team had shared their findings.

So Mr Penadés was happy to use this to test Google's new AI tool.

Just two days later, the AI returned a few hypotheses - and its first thought, the top answer provided, suggested superbugs may take tails in exactly the way his research described.

'This will change science'

The impact of AI is hotly contested.

Its advocates say it will enable scientific advances - while others worry it will eliminate jobs.

Prof Penadés said he understood why fears about the impact on jobs such as his was the "first reaction" people had but added "when you think about it it's more that you have an extremely powerful tool."

He said the researchers on the project were convinced that it would prove very useful in the future.

"I feel this will change science, definitely," Mr Penadés said.

"I'm in front of something that is spectacular, and I'm very happy to be part of that.

"It's like you have the opportunity to be playing a big match - I feel like I'm finally playing a Champions League match with this thing."

bnew · Feb 23, 2025

https://archive.is/1epqh

https://i.redd.it/kjm69pgsbxje1.jpeg

"Reality does not match training data."

DAMN.

bnew · Feb 23, 2025

https://archive.is/SyCNh

1/10
@jam3scampbell
almost everyone is under-appreciating automated AI research

a lot of ML ppl have the prior that “things are hard”, which is reasonable if you’ve worked on a hard problem before!

but when you double total factor productivity with agents, you double the rate at which advances are made

advances that used to take months will happen in weeks. capabilities that you would’ve expected in years will happen in months

“things are hard” will become an increasingly bad heuristic as AI *speeds up* the rate at which progress is made. things will start to feel eerily easy.

people are bad at anticipating exponentials, but they’re especially bad at predicting hyperbolic growth, which is (approximately) what you get when agents speed up the growth rate itself

2/10
@wordgrammer
I’m curious what you think this speed up will look like. Just a hyperparameter search across hundreds of model architectures?

3/10
@jam3scampbell
->have the models come up with a giant list of research ideas
->for any sub-area, they can read every paper and carefully reason about follow-up experiments
->for every idea, do a rigorous analysis to assess its quality and how much compute it would take to test
->rank list of ideas by highest quality and least compute required
->go down the list, implement the idea (takes 5 minutes to write the code), run the experiment, get results
->do this for every idea as fast as possible. maximize the number of quality experiments
->because all the thinking and coding is instantaneous, the only bottleneck is experimental GPU hours

this is vastly more efficient than a typical hyperparameter search, especially when you consider how much research can be done with minimal compute

4/10
@robleclerc
We know physically how acceleration feels. But what does the acceleration of acceleration feel like?

Physics label that ‘jerk’.

5/10
@harris_edouard
Precisely

6/10
@cosekant
I’m so ready for a “things are interesting” era

Where folks are decreasingly put off by the complexity of a problem

And increasingly engaged by the value of solving it

7/10
@LocBibliophilia
Do you think that this could be used for recurisve alignment? That's probably one of the best hopes in our situation.

8/10
@LeviTurk
recursive self improvement leads to an ai god.
*people think about it for 5 seconds
*people go on with their lives

9/10
@manuhortet
the compound acceleration of this is impossible to understand from here

but it's really starting to feel like we're taking off

10/10
@grnsleeves
what about the speed of training runs? will they just be more correct?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

AI that’s smarter than humans? Americans say a firm “no thank you.”

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

To live in hearts we leave behind is to never die.

Veteran

Veteran

What the Bookmarklet Does on a XenForo-Compatible Forum​

How It Works (Step-by-Step)​

Loads the Preferences Page​

Reads Your Current Setting​

Submits the Updated Setting​

Shows a User-Friendly Notification​

Reloads the Page​

Why This is Useful​

How to Use the Bookmarklet​

Final Summary​

How It Works​

Veteran

AI cracks superbug problem in two days that took scientists years​

What is AI and how does it work?​

Bugged by superbugs​

'This will change science'​

Veteran

Veteran

Similar threads

What the Bookmarklet Does on a XenForo-Compatible Forum

How It Works (Step-by-Step)

Loads the Preferences Page

Reads Your Current Setting

Submits the Updated Setting

Shows a User-Friendly Notification

Reloads the Page

Why This is Useful

How to Use the Bookmarklet

Final Summary

How It Works

AI cracks superbug problem in two days that took scientists years

What is AI and how does it work?

Bugged by superbugs

'This will change science'