1/24
@RubenHssd
BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.
They just memorize patterns really well.
Here's what Apple discovered:
(hint: we're not as close to AGI as the hype suggests)
2/24
@RubenHssd
Instead of using the same old math tests that AI companies love to brag about, Apple created fresh puzzle games.
They tested Claude Thinking, DeepSeek-R1, and o3-mini on problems these models had never seen before.
The result ↓
3/24
@RubenHssd
All "reasoning" models hit a complexity wall where they completely collapse to 0% accuracy.
No matter how much computing power you give them, they can't solve harder problems.
4/24
@RubenHssd
As problems got harder, these "thinking" models actually started thinking less.
They used fewer tokens and gave up faster, despite having unlimited budget.
5/24
@RubenHssd
Apple researchers even tried giving the models the exact solution algorithm.
Like handing someone step-by-step instructions to bake a cake.
The models still failed at the same complexity points.
They can't even follow directions consistently.
6/24
@RubenHssd
The research revealed three regimes:
• Low complexity: Regular models actually win
• Medium complexity: "Thinking" models show some advantage
• High complexity: Everything breaks down completely
Most problems fall into that third category.
7/24
@RubenHssd
Apple discovered that these models are not reasoning at all, but instead doing sophisticated pattern matching that works great until patterns become too complex.
Then they fall apart like a house of cards.
8/24
@RubenHssd
If these models were truly "reasoning," they should get better with more compute and clearer instructions.
Instead, they hit hard walls and start giving up.
Is that intelligence or memorization hitting its limits?
9/24
@RubenHssd
This research suggests we're not as close to AGI as the hype suggests.
Current "reasoning" breakthroughs may be hitting fundamental walls that can't be solved by just adding more data or compute.
10/24
@RubenHssd
Models could handle 100+ moves in Tower of Hanoi puzzles but failed after just 4 moves in River Crossing puzzles.
This suggests they memorized Tower of Hanoi solutions during training but can't actually reason.
11/24
@RubenHssd
While AI companies celebrate their models "thinking," Apple basically said "Everyone's celebrating fake reasoning."
The industry is chasing metrics that don't measure actual intelligence.
12/24
@RubenHssd
Apple's researchers used controllable puzzle environments specifically because:
• They avoid data contamination
• They require pure logical reasoning
• They can scale complexity precisely
• They reveal where models actually break
Smart experimental design if you ask me.
13/24
@RubenHssd
What do you think?
Is Apple just "coping" because they've been outpaced in AI developments over the past two years?
Or is Apple correct?
Comment below and I'll respond to all.
14/24
@RubenHssd
If you found this thread valuable:
1. Follow me @RubenHssd for more threads around what's happening around AI and it's implications.
2. RT the first tweet
[Quoted tweet]
BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.
They just memorize patterns really well.
Here's what Apple discovered:
(hint: we're not as close to AGI as the hype suggests)
15/24
@VictorTaelin
I have a lot to say about this but I'm in a hospital right now. In short - this is a very well written paper that is undeniably correct, and makes a point that is obvious to anyone in the area. LLMs are *not* reasoning. They're more like a humanity-wide, cross-programming-language, global hash-consing or sorts. That is extremely powerful and will advance many areas, but it *not* going to result in AGI. That said, what most miss is the real lesson taught by LLMs: massive compute, added to an otherwise simple algorithm, wields immense power and utility. I don't know why people fail to see this obvious message, but the next big thing is obviously going to be companies that realize this very lesson and use that to build entirely new things that can take advantage of massive scale.
16/24
@PrestonPysh
Kinda rich coming from Apple don’t ya think?
17/24
@zayn4pf
good thread man
18/24
@FrankSchuil
Paperclip optimizers will still go a long way.
19/24
@sypen231984
Didn’t Anthropic already prove this
20/24
@dohko_01
AI is not capable of abstract thought.. it’s just pattern matching on steroids
21/24
@sifisobiya




22/24
@thepowerofozone
That should have been obvious to anyone who used AI for longer than 5 minutes.
23/24
@thepsironi
That is obvious, not much of a discovery.
24/24
@dgt10011
Whether AGI is here or not is irrelevant. What’s important is that I’ve seen enough with my own eyes to know there’s going to be tons of labor replacement and the social contract will be completely upended sooner than we think.
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196