Apparently chat GPT 5 is underwhelming lmao

bnew

Veteran
Joined
Nov 1, 2015
Messages
67,183
Reputation
10,397
Daps
181,518
chatgpt is orwellian and psycho.

the more they engineer it for generality the worse it gets with factual hallucinations.


:hubie:

they claim otherwise :patrice:



The more advanced artificial intelligence (AI) gets, the more it "hallucinates" and provides incorrect and inaccurate information.

Research conducted by OpenAI found that its latest and most powerful reasoning models, o3 and o4-mini, hallucinated 33% and 48% of the time, respectively, when tested by OpenAI's PersonQA benchmark. That's more than double the rate of the older o1 model. While o3 delivers more accurate information than its predecessor, it appears to come at the cost of more inaccurate hallucinations.
















1/11
@buildthatidea
openai just dropped gpt-5 and it's insane!

it’s smarter, cheaper and faster than previous sota coding models.

here's everything you need to know:



GxxcBhTXEAAfgJI.png


2/11
@buildthatidea
1/ gpt-5 is so much smarter than sonnet/ opus while being so much cheaper

it scores 74.9% on swe-bench, beating claude opus 4.1’s 74.5%.



GxxT-zUXkAANKIW.jpg

GxxPZMNWQAEildw.jpg


3/11
@buildthatidea
2/ gpt-5 (with thinking) hallucinates less than openai's previous models:

- hallucinates 5x less than o3 on open-source prompts
- hallucinates less with reasoning on health bench



GxxOasBXEAEPfIo.jpg


4/11
@buildthatidea
3/ gpt-5 pro scores less on humanity's last exam than grok 4 heavy



GxxQNcHXMAAjf1c.jpg

GxxQQCXW8AA-ZKc.jpg


5/11
@buildthatidea
4/ gpt-5 gets a 95.2% score on the needle-in-haystack test

that’s a massive jump from gpt-4.1 at 57.2% and o3 at 55%.

it’s the best performance openai has ever published on long-context retrieval.



GxxRbVNXwAAN-IS.jpg


6/11
@buildthatidea
5/ gpt-5 achieves 100% on aime 2025



GxxTHhkXEAAgcDQ.png


7/11
@buildthatidea
6/ gpt-5 is the best writing model openai has ever released

it handles rhythm, ambiguity, and form with depth and clarity



GxxX9LQWMAA6ZKk.jpg

GxxX-zLXQAAJLpP.jpg


8/11
@buildthatidea
7/ gpt-5 can make web apps/ games in one shot



GxxYqWmXAAAtOz2.jpg

GxxcaO4WQAA9c3m.jpg


9/11
@buildthatidea
8/ gpt-5 has 4 new personalities

cynic, robot, listener and nerd

these are text-only, opt-in, and available in settings under customize chatgpt



GxxZoOlXEAEcHsU.jpg


10/11
@buildthatidea
9/ gpt-5 is a unified system that picks the right mode for the job

- fast mode handles everyday questions
- thinking mode kicks in for complex tasks
- a smart router chooses automatically based on your prompt

you can also type "think hard about this" to trigger deep reasoning



11/11
@buildthatidea
10/ want to launch your own ai agents with gpt-5?

sign up at BuildThatIdea - The world's biggest AI agent marketplace. it's coming soon




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top