REVEALED: Open A.I. Staff Warn "The progress made on Project Q* has the potential to endanger humanity" (REUTERS)

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,426
Reputation
7,364
Daps
134,465



1/3
GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot . Here’s how it’s been doing.

2/3
But the ELO can ultimately become bounded by the difficulty of the prompts (i.e. can’t achieve arbitrarily high win rates on the prompt: “what’s up”). We find on harder prompt sets — and in particular coding — there is an even larger gap: GPT-4o achieves a +100 ELO over our prior

3/3
Not only is this the best model in the world, but it's available for free in ChatGPT, which has never before been the case for a frontier model.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNeNBUgbIAAsV8F.jpg

GNeNG9GaoAAL9gE.jpg

GNeYskfXMAARjb0.png

GNeoBUKXsAAkhxb.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,426
Reputation
7,364
Daps
134,465


1/2
Ilya and OpenAI are going to part ways. This is very sad to me; Ilya is easily one of the greatest minds of our generation, a guiding light of our field, and a dear friend. His brilliance and vision are well known; his warmth and compassion are less well known but no less important.

OpenAI would not be what it is without him. Although he has something personally meaningful he is going to go work on, I am forever grateful for what he did here and committed to finishing the mission we started together. I am happy that for so long I got to be close to such genuinely remarkable genius, and someone so focused on getting to the best future for humanity.

Jakub is going to be our new Chief Scientist. Jakub is also easily one of the greatest minds of our generation; I am thrilled he is taking the baton here. He has run many of our most important projects, and I am very confident he will lead us to make rapid and safe progress towards our mission of ensuring that AGI benefits everyone.

2/2
congratulations to
@oklo on going public, especially
@jakedewitte
and
@caorilne
, who i have worked with for a decade.

energy is one of the most important things to work on and i’m excited to help support that mission. onward!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNkzVQlXsAAgrc0.jpg




1/1
After almost a decade, I have made the decision to leave OpenAI. The company’s trajectory has been nothing short of miraculous, and I’m confident that OpenAI will build AGI that is both safe and beneficial under the leadership of
@sama ,
@gdb
,
@miramurati
and now, under the excellent research leadership of
@merettm
. It was an honor and a privilege to have worked together, and I will miss everyone dearly. So long, and thanks for everything. I am excited for what comes next — a project that is very personally meaningful to me about which I will share details in due time.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNkyJBsbUAARmc5.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,426
Reputation
7,364
Daps
134,465

1/1
Updated Gemini 1.5 Pro report: MATH benchmark for specialized version now at 91.1%, SOTA 3 years ago was 6.9%, overall a lot of progress from February to May in all benchmarks


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196









1/7
A mathematics-specialized version of Gemini 1.5 Pro achieves some extremely impressive scores in the updated technical report.

2/7
From the report; 'Currently the math-specialized model is only being explored for Google internal research use cases; we hope to bring these stronger math capabilities into our deployed models soon.'

3/7
New benchmarks, including Flash.

4/7
Google is doing something very interesting by building specialized versions of its frontier models for math, healthcare, and education (so far). The benchmarks on all of these are pretty impressive, and it seems to be beyond what can be done with traditional fine tuning alone. twitter.com/jeffdean/statu…

5/7
1.5 Pro is now stronger than 1.0 Ultra.

6/7
4o only got to enjoy the crown for 4 days.


7/7
They put Av_Human at the top of the chart there visually to make people feel better. The average human is now in third place.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNzIjYKWUAADUtx.jpg

GNzL61TWkAAZvvS.jpg

GNzWUnMW0AIPiLs.jpg

GNy5GCNXcAAhph2.jpg

GNy5bmHWkAAsM7j.jpg

GNy5w6oXgAAwF13.jpg

GN0WhkIWcAAcwLI.jpg

GNz3_cBbIAAFgzx.jpg

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,426
Reputation
7,364
Daps
134,465
Top