Bard gets its biggest upgrade yet with Gemini {Google A.I / LLM}

bnew

Veteran
Joined
Nov 1, 2015
Messages
63,527
Reputation
9,702
Daps
173,500



1/11
@GoogleDeepMind
We’re releasing an updated Gemini 2.5 Pro (I/O edition) to make it even better at coding. 🚀

You can build richer web apps, games, simulations and more - all with one prompt.

In @GeminiApp, here's how it transformed images of nature into code to represent unique patterns 🌱



https://video.twimg.com/amplify_video/1919768928928051200/vid/avc1/1080x1920/taCOcXbyaVFwRWLw.mp4

2/11
@GoogleDeepMind
This latest version of Gemini 2.5 Pro leads on the WebDev Arena Leaderboard - which measures how well an AI can code a compelling web app. 🛠️

It also ranks #1 on @LMArena_ai in Coding.



GqRiF1PWIAAaHN0.jpg


3/11
@GoogleDeepMind
Beyond creating beautiful UIs, these improvements extend to tasks such as code transformation and editing as well as developing complex agents.

Now available to try in @GeminiApp, @Google AI Studio and @GoogleCloud’s /search?q=#VertexAI platform. Find out more → Build rich, interactive web apps with an updated Gemini 2.5 Pro



GqRiYqbW0AE6YtX.jpg


4/11
@koltregaskes
Excellent, will we get the non-preview version at I/O?



5/11
@alialobai1
@jacksharkey11 they are cooking …



6/11
@laoddev
that is wild



7/11
@RaniBaghezza
Very cool



8/11
@burny_tech
Gemini is a gift that I can have 100 simple coding ideas per day and draft simple versions of them all



9/11
@thomasxdijkstra
@cursor_ai when



10/11
@shiels_ai
Unreal 🤯🤯🤯



11/11
@LarryPanozzo
Anthropic rn




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/21
@GeminiApp
We just dropped Gemini 2.5 Pro (I/O edition). It’s our most intelligent model that’s even better at coding.

Now, you can build interactive web apps in Canvas with fewer prompts.

Head to ‎Gemini and select “Canvas” in the prompt bar to try it out, and let us know what you’re building in the comments.



https://video.twimg.com/amplify_video/1919768593987727360/vid/avc1/1920x1080/I7FL20DtXMKELQCF.mp4

2/21
@GeminiApp
Interact with the game from our post here: ‎Gemini - lets use noto emoji font https://fonts.google.com/noto/specimen/Noto+Color+Emoji



3/21
@metadjai
Awesome! ✨



4/21
@accrued_int
it's like they are just showing off now ☺️



5/21
@ComputerMichau
For me 2.5 Pro is still experimental.



6/21
@arulPrak_
AI agentic commerce ecosystem for travel industry



7/21
@sumbios
Sweet



8/21
@AIdenAIStar
I'd say it is a good model. Made myself a Gemini defender game



https://video.twimg.com/amplify_video/1919783171723292672/vid/avc1/1094x720/Y9mPukwagcRIr7fK.mp4

9/21
@car_heroes
ok started trial. Basic Pacman works. Anything else useful so far is blank screen after a couple of updates. It can't figure it out. New MAC, Sequoia 15.3.2 and Chrome Version 136.0.7103.92. I want this to work but I cant waist time on stuff that should work at launch.



10/21
@rand_longevity
this week is really heating up



11/21
@reallyoptimized
@avidseries You got your own edition! It's completely not woke, apparently.



12/21
@A_MacLullich
I could also make other simple clinical webapps to help with workflow. For example, if a patient with /search?q=#delirium is distressed, this screen could help doctors and nurses to assess for causes. Clicking on each box would reveal more details.



GqRxGZVWsAAPHxo.png


13/21
@nurullah_kuus
Seems interesting, i ll give it a shot



14/21
@dom_liu__
I used Gemini 2.5 Pro to create a Dragon game, and it was so much fun! The code generation was fast, complete, and worked perfectly on the first try with no extra tweaks needed. I have a small question: is this new model using gemini-2.5-pro-preview-05-06?



GqRoDN1bUAARTNl.jpg


15/21
@ai_for_success
Why is ir showing Experimental?



16/21
@G33K13765260
damn. it fukked my entire code.. ran back to claude :smile:



17/21
@A_MacLullich
Would like to develop a 4AT /search?q=#delirium assessment tool webapp too.

I already have @replit one here: http://www.the4AT.com/trythe4AT - would be nice to have a webapp option for people too.



GqRwK7aX0AAANEA.png


18/21
@davelalande
I am curious about Internet usage. I mainly use X and AI, and I rarely traverse the web anymore. How many new websites are finding success, and is the rest of the world using the web like it's 1999? Will chat models build an app for one-time use with that chat session?



19/21
@arthurSlee
Using this solar system prompt - I initially got an error. However after the fix, it did create the best looking solar system in one prompt.

‎Gemini - Solar System Visualization HTML Page

Nice work. I also like how easy it is to share executing code.



20/21
@AI_Techie_Arun
Wow!!!! Amazing

But what's the I/O edition?



21/21
@JvShah124
Great 😃




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196






1/11
@slow_developer
now this is very interesting...

the new gemini 2.5 pro model seems to have fallen behind in many areas

coding is the only thing it still handles well.

so, does that mean this model was built mainly for coding?



GqVZr68aMAAf8Y_.jpg


2/11
@Shawnryan96
I have not seen any issues in real world use. In fact image reasoning seems better



3/11
@slow_developer
i haven’t tried anything except for the code, but this is a comparison-based chart with the previous version



4/11
@psv2522
its not fallen behind the new model is probably a distillation+trained for coding much better.



5/11
@slow_developer
much like what Anthropic did with 3.5 to their next updated version 3.6?



6/11
@sdmat123
That's how tuning works, yes. You can see the same kind of differences in Sonnet 3.7 vs 3.6.

3.7 normal regressed quantitatively on MMLU and ARC even with base level reasoning skills on 3.6. It is regarded as subjectively worse in many domains outside of coding.



7/11
@slow_developer
agree

[Quoted tweet]
much like what Anthropic did with 3.5 to their next updated version 3.6?


8/11
@mhdfaran
It’s interesting how coding is still the highlight here.



9/11
@NFTMentis
Wait - what?

Is this a response the to @OpenAI news re: @windsurf_ai ?



10/11
@K_to_Macro
This shows the weakness of RL



11/11
@humancore_ai
I don’t care. I want one that is a beast at coding, there are plenty of general purpose ones.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196











1/11
@OfficialLoganK
Gemini 2.5 Pro just got an upgrade & is now even better at coding, with significant gains in front-end web dev, editing, and transformation.

We also fixed a bunch of function calling issues that folks have been reporting, it should now be much more reliable. More details in 🧵



GqRgjC0WgAAJJsC.jpg


2/11
@OfficialLoganK
The new model, "gemini-2.5-pro-preview-05-06" is the direct successor / replacement of the previous version (03-25), if you are using the old model, no change is needed, it should auto route to the new version with the same price and rate limits.

Gemini 2.5 Pro Preview: even better coding performance- Google Developers Blog



3/11
@OfficialLoganK
And don't just take our word for it:

“The updated Gemini 2.5 Pro achieves leading performance on our junior-dev evals. It was the first-ever model that solved one of our evals involving a larger refactor of a request routing backend. It felt like a more senior developer because it was able to make correct judgement calls and choose good abstractions.”

– Silas Alberti, Founding Team, Cognition



4/11
@OfficialLoganK
Developers really like 2.5 Pro:

“We found Gemini 2.5 Pro to be the best frontier model when it comes to "capability over latency" ratio. I look forward to rolling it out on Replit Agent whenever a latency-sensitive task needs to be accomplished with a high degree of reliability.”

– Michele Catasta, President, Replit



5/11
@OfficialLoganK
Super excited to see how everyone uses the new 2.5 Pro model, and I hope you all enjoy a little pre-IO launch : )

The team has been super excited to get this into the hands of everyone so we decided not to wait until IO.



6/11
@JonathanRoseD
Does gemini-2.5-pro-preview-05-06 improve any other aspects other than coding?



7/11
@OfficialLoganK
Mostly coding !



8/11
@devgovz
Ok, what about 2.0 Flash with image generation? When will the experimental period end?



9/11
@OfficialLoganK
Soon!



10/11
@frantzenrichard
Great! How about that image generation?



11/11
@OfficialLoganK
: )




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196





1/11
@demishassabis
Very excited to share the best coding model we’ve ever built! Today we’re launching Gemini 2.5 Pro Preview 'I/O edition' with massively improved coding capabilities. Ranks no.1 on LMArena in Coding and no.1 on the WebDev Arena Leaderboard.

It’s especially good at building interactive web apps - this demo shows how it can be helpful for prototyping ideas. Try it in @GeminiApp, Vertex AI, and AI Studio http://ai.dev

Enjoy the pre-I/O goodies !



https://video.twimg.com/amplify_video/1919778857193816064/vid/avc1/1920x1080/FtMuHzKJiZuaP5Uy.mp4

2/11
@demishassabis
It’s been amazing to see the response to Gemini 2.5 series so far - and we're continuing to rev in response to feedback, so keep it coming !

https://blog.google/products/gemini/gemini-2-5-pro-updates



3/11
@demishassabis
just a casual +147 elo rating improvement... no big deal 😀



GqRyhq_WAAAsZxS.jpg


4/11
@johnseach
Gemini is now the best coding LLM by far. It is excelling at astrophysics code where all other fail. Google is now the AI coding gold standard.



5/11
@WesRothMoney
love it!

I built a full city traffic simulator in under 20 minutes.

here's the timelapse from v1.0 to (almost) done.



https://video.twimg.com/amplify_video/1919886890997841920/vid/avc1/1280x720/neHj9PPTfPxeaU3U.mp4

6/11
@botanium
This is mind blowing 🤯



7/11
@_philschmid
Lets go 🚀



8/11
@A_MacLullich
Excited to try this - will be interesting to compare with others? Any special use cases?



9/11
@ApollonVisual
congrats on the update. I feel that coding focused LLMs will accelerate progress expotentially



10/11
@JacobColling
Excited to try this in Cursor!



11/11
@SebastianKits
Loving the single-shot quality, but would love to see more work towards half-autonomous agentic usage. E.g when giving a task to plan and execute a larger MVP, 2.5 pro (and all other models) often do things in a bad order that leads to badly defined styleguides, not very cohesive view specs etc. This is not a problem of 2.5 pro, all models of various providers do this without excessive guidance.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
63,527
Reputation
9,702
Daps
173,500


1/4
@LechMazur
Gemini 2.5 Pro Preview (05-06) scores 42.5, compared to 54.1 for Gemini 2.5 Pro Exp (03-25) on the Extended NYT Connections Benchmark.

More info: GitHub - lechmazur/nyt-connections: Benchmark that evaluates LLMs using 651 NYT Connections puzzles extended with extra trick words



GqWauNMXoAAuKh8.jpg


2/4
@LechMazur
Mistral Medium 3 scores 12.9.



GqXRed-XgAEHCbJ.jpg


3/4
@akatzzzzz
code sloptimized



4/4
@ficlive
Big fan of your benchmarks, can you test 03-25 Preview as well as that's where the big decline was for us.

[Quoted tweet]
Gemini 2.5 Pro Preview gives good results, but can't quite match the original experimental version.


GqSKqyaWUAAd-x6.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196





1/3
@HCSolakoglu
Reviewing recent benchmark data for gemini-2.5-pro. Comparing the 05-07 to the 03-25, we see a roughly 4.2% lower Elo score on EQ-Bench 3 and about a 4.9% lower score on the Longform Creative Writing benchmark. Interesting shifts.



GqXYy2PW0AArINn.jpg

GqXYzGkXkAA_iiI.jpg


2/3
@HCSolakoglu
Tests & images via: @sam_paech



3/3
@MahawarYas27492
@OfficialLoganK @joshwoodward @DynamicWebPaige




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196







1/7
@ChaseBrowe32432
Ran a few times to verify, seeing degraded performance on my visual physics reasoning benchmark for the new Gemini 2.5 Pro



GqSGaMWXUAAIjCU.png


2/7
@ChaseBrowe32432
cbrower



3/7
@random_wander_
nice benchmark! Gwen and Grok would be interesting.



4/7
@ChaseBrowe32432
Grok still has no API vision, I haven’t got to running Qwen bc I don’t know how to deal with providers being wishy wsshy about precision



5/7
@figuret20
Most benchmarks this new version is worse. Check the official benchmark results for this new one vs the old one. This is a downgrade on everything but webdev arena.



6/7
@ChaseBrowe32432
Where do you see official benchmark results? I thought they'd come with the new model card but I can still only see the old model card



7/7
@akatzzzzz
Worst timeline ever is overfitting to code slop and calling it AGI




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/2
@r0ck3t23
Performance Analysis: Gemini 2.5 Pro Preview vs Previous Version

Fascinating benchmark comparison here! The data reveals some interesting trends:

The Preview build (05-06) of Gemini 2.5 Pro shows notable improvements in coding metrics (+5.2% on generationLiveCodeBench, +2.5% on editingAider Polyglot) compared to the earlier Experimental build (03-25).

However, there are modest performance decreases across most other domains:
- Math: -3.7% on AIME 2025
- Image understanding: -3.8% on Vibe-Eval
- Science: -1.0% on GPQA diamond
- Visual reasoning: -2.1% on reasoningMMU

This raises interesting questions about optimization trade-offs. While it excels at code-related tasks, has this focus come at the expense of other capabilities? Or is this part of a broader optimization strategy that will eventually see improvements across all domains?



GqVnxClWoAEJ7Mo.jpg


2/2
@Lowkeytyc00n1
That's an Improvement




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
63,527
Reputation
9,702
Daps
173,500

The Google Gemini generative AI logo on a smartphone.

Image Credits:Andrey Rudakov/Bloomberg / Getty Images

AI

Google launches ‘implicit caching’ to make accessing its latest AI models cheaper​


Kyle Wiggers

11:20 AM PDT · May 8, 2025

Google is rolling out a feature in its Gemini API that the company claims will make its latest AI models cheaper for third-party developers.

Google calls the feature “implicit caching” and says it can deliver 75% savings on “repetitive context” passed to models via the Gemini API. It supports Google’s Gemini 2.5 Pro and 2.5 Flash models.

That’s likely to be welcome news to developers as the cost of using frontier models continues to grow.

Caching, a widely adopted practice in the AI industry, reuses frequently accessed or pre-computed data from models to cut down on computing requirements and cost. For example, caches can store answers to questions users often ask of a model, eliminating the need for the model to re-create answers to the same request.

Google previously offered model prompt caching, but only explicit prompt caching, meaning devs had to define their highest-frequency prompts. While cost savings were supposed to be guaranteed, explicit prompt caching typically involved a lot of manual work.

Some developers weren’t pleased with how Google’s explicit caching implementation worked for Gemini 2.5 Pro, which they said could cause surprisingly large API bills. Complaints reached a fever pitch in the past week, prompting the Gemini team to apologize and pledge to make changes.

In contrast to explicit caching, implicit caching is automatic. Enabled by default for Gemini 2.5 models, it passes on cost savings if a Gemini API request to a model hits a cache.
“[W]hen you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit,” explained Google in a blog post. “We will dynamically pass cost savings back to you.”

The minimum prompt token count for implicit caching is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro, according to Google’s developer documentation, which is not a terribly big amount, meaning it shouldn’t take much to trigger these automatic savings. Tokens are the raw bits of data models work with, with a thousand tokens equivalent to about 750 words.

Given that Google’s last claims of cost savings from caching ran afoul, there are some buyer-beware areas in this new feature. For one, Google recommends that developers keep repetitive context at the beginning of requests to increase the chances of implicit cache hits. Context that might change from request to request should be appended at the end, the company says.

For another, Google didn’t offer any third-party verification that the new implicit caching system would deliver the promised automatic savings. So we’ll have to see what early adopters say.
 
Top