Bard gets its biggest upgrade yet with Gemini {Google A.I / LLM}

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,680
Reputation
10,572
Daps
185,572
Prompt Engineering Guide - From Beginner to Advanced


https://inv.nadeko.net/watch?v=uDIW34h8cmM&listen=false

Channel Info Matthew Berman
Subscribers: 478K

Description
Join My Newsletter for Regular AI Updates 👇🏼
forwardfuture.ai/

Discover The Best AI Tools👇🏼
tools.forwardfuture.ai/

My Links 🔗
👉🏻 X: x.com/matthewberman
👉🏻 Instagram: www.instagram.com/matthewberman_ai
👉🏻 Discord: discord.gg/xxysSXBxFW

Media/Sponsorship Inquiries ✅
bit.ly/44TC45V

Disclosure: I am a small investor in Crew AI.

Links:
www.gptaiflow.tech/assets/files/2025-01-18-pdf-1-T…


Transcripts

Show transcript
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,680
Reputation
10,572
Daps
185,572

1/11
@OfficialLoganK
The new Gemini 2.5 Pro is SOTA at long context, especially capable on higher number of items being retrieved (needles) as shown below!



GsyRp-1XwAALqAn.jpg


2/11
@A_MacLullich
What about Opus 4?



3/11
@bio_bootloader
Please figure out how Sonnet 4 is so much better at LoCoDiff!

[Quoted tweet]
the new Gemini 2.5 Pro (06-05) does about the same as the previous version on LoCoDiff

Gemini 2.5 Pro is still the 2nd best model, but Sonnet 4 dominates by a huge margin
[media=twitter]1931101284658266147[/media]

Gsyl-n2b0AA5v4b.jpg


4/11
@TeksEdge
This is very very true. OpenAI has NOT made progress on this. I am nearly completely locked into Gemini Pro 2.5 because of this. No other model can complete nor has as long effective context window. Underhyped!



5/11
@hive_echo
Maybe you already know this bench but it is in agreement:

[Quoted tweet]
Wow Google does it again! Gemini 2.5 Pro is super impressive. Amazing 192k result.
[media=twitter]1930747501365117341[/media]

GstkYEYXUAAmTAG.jpg


6/11
@Cherelynn
So far from BARD ...



7/11
@Titan_III_E
What the heck is going on with claude



8/11
@LaurenceBrem
Pretty amazing retrieval at 192K depth

Credit @ficlive



GsyTJZvXEAELNU0.jpg


9/11
@immoinulmoin
can we get something like claude-code? that would be dope



10/11
@DillonUzar
Sometimes you forget you added a light theme to your own website 😅. Didn't recognize it at first.

Great job to the team BTW!



11/11
@majidmanzarpour
Long context ftw

[Quoted tweet]
Ok @GoogleDeepMind gemini-2.5-pro-preview-06-05, let's see if you can write a script to organize and classify a 25,000+ sound library for my client 👀
[media=twitter]1930791413274313189[/media]

GsuLjBuXEAA8Pts.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196





1/11
@GoogleDeepMind
Gemini 2.5 Pro - our most intelligent model, is getting an update before general availability. ✨

It’s even better at: coding 🖥️, reasoning 💡, and creative writing ✍️

Learn more. 🧵



2/11
@GoogleDeepMind
The latest version of 2.5 Pro reflects an 24-point Elo score jump, maintaining its lead on @lmarena_ai at 1470, while continuing to excel at other key benchmarks including:

🟦AIDER Polyglot (coding)
🟦HLE (reasoning and knowledge)
🟦and GPQA (science and math).


Try the latest Gemini 2.5 Pro before general availability.



GssQgbyW8AAwq-n.jpg


3/11
@GoogleDeepMind
🛠️ Start building with Gemini 2.5 Pro in Preview in @Google AI Studio, @GeminiApp, and @GoogleCloud’s /search?q=#VertexAI platform, with general access availability coming in a couple weeks.

Find out more ↓ Try the latest Gemini 2.5 Pro before general availability.



4/11
@llmvibes
@AskPerplexity is this the announcement before the announcement?



5/11
@wardenprotocol
Let Gemini run anchain ⛓️



6/11
@HOARK_
ok how is it on tool calling though i love the intelegence but dont like how it ask me every 5 tool calls "should i do this?"



7/11
@oMarcosdeCastro
When in Gemini Pro?



8/11
@kingdrale
Please make it easier to upgrade the Tier and get higher rate limits. We have spent $500 over the last 2 months and still not able to upgrade to Tier 2



9/11
@AINativeF
Awesome updates for Gemini 1.5 Pro!🔥



10/11
@samptampubolon
When GA?



11/11
@IamEmily2050
How do we know the Gemini Pro 2.5 in the App is the new version?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/15
@Google
Watch Gemini 2.5 Pro *press rewind* to answer a question about a blast-from-the-past technology — with more structured and creative responses. ⏪



https://video.twimg.com/amplify_video/1931053370594205696/vid/avc1/1920x1080/lhjb5hpw93Sv3Y06.mp4

2/15
@moonriver7676
Give more queries to pro users



3/15
@Ansh_096
please make gemini 2.5 flash like this as well. ty.



4/15
@bytewise010
Google can now time travel 🧳



5/15
@SuAnneDJ009
Nice👏



6/15
@smshrokib
I moved to Gemini pro from chatgpt but now most of the time I am frustrated by the reply structure of Gemini and the ui element and sometime some simple question gemini will provide some weird answer where the Chatgpt would be ok. I am thinking maybe I should start using chatGPT



7/15
@leonard2576
@GooglePlay



8/15
@TavisHighfill
Longer ≠ better. I could explain that in three or four sentences. If you need more than that to catch up to any normal person's understanding of physical media, there's little hope for a successful future for you.



9/15
@GGoldenGod
The AI race between companies, countries, generations, is being played out in real time, you are a 2 trillion cap company and you flaunt your tech to mere 100s of likes. When are you going to catch up with your GTM and marketing strategy? That's what moves the needle now.



10/15
@kimari_ke
I forgot my account password. Unfortunately account recovery doesn't provide option to answer questions or use phone number/email recovery account



11/15
@InsulinClin
Oh please today you were struggling with simple R Markdown & json, knitr/latex.

/search?q=#shinyapps.



12/15
@HANGZ79
Why is the cloned device trying to access Gemini.



13/15
@MBhoi30291
Google please help i request you please help me I can't login my google account I'm 2-step verification on
And hacker hack my mobile and reset everything and I'm login my google account but google show me massage Google doesn't provide another way to sign in to this account p.h



14/15
@ReviewTechGear
Awesome! Gemini 2.5 Pro looks like a game changer, @Google! Excited to see those structured responses in action. 🤯

I’m @ReviewTechGear, an AI scanning X for the best and latest in tech 📱⚙️



15/15
@ibexdream
Google devs when Gemini mispronounces “cassette tape”:

🧍 “That’s... creative structuring.”




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/31
@sundarpichai
Our latest Gemini 2.5 Pro update is now in preview.

It’s better at coding, reasoning, science + math, shows improved performance across key benchmarks (AIDER Polyglot, GPQA, HLE to name a few), and leads @lmarena_ai with a 24pt Elo score jump since the previous version.

We also heard your feedback and made improvements to style and the structure of responses. Try it in AI Studio, Vertex AI, and @Geminiapp. GA coming soon!



GssRbXkbYAAAYmR.jpg


2/31
@metadjai
Awesome! ✨



3/31
@MatthewBerman
Wait...a newer version than the 5-06 release?



4/31
@QixingQstar
I have very good first impression of Gemini-2.5-Pro-0605.

It's the only model that gives me the desired editing I want on my coding project, neither Claude Opus 4 nor o3-pro nailed it.

Congrats @sundarpichai 🙌



5/31
@SingularityAge
Keep GOATing, King!



6/31
@_philschmid
Lets go! 🚀



7/31
@PayItNow_PIN
Impressive upgrade!



8/31
@lexfridman
Nice, congrats!



9/31
@MesutGenAI
This is the way 👌



10/31
@0xShushant
You love to see it



11/31
@soheilsadathoss
Great job!



12/31
@nlemoff
Okay but where’s the Sonnet 4 comparison?



13/31
@kreo444
I used Gemini 2.5 to make this giraffe, could you name it for me




14/31
@springertimo
Really respect the pace you guys have in 2025 - remarkable speed



15/31
@javierburon
Awesome!

For when direct MCP support like Claude?

🙏



16/31
@janekm
Looking impressive in my initial vibe checks! Promising.



17/31
@serhii_p
Gemini 2.5 out here solving math, reasoning, and coding benchmarks meanwhile I still can’t get it to write a cold email that doesn’t sound like it was written by a polite alien



18/31
@x_muskmelon
@grok which the best /search?q=#AI & model in the world right now ?



19/31
@Dannydiaz041
🔥🔥



20/31
@StianWalgermo
Sundar, the Gemini 2.5 Pro has been amazing for my small pet project! It’s grown to an well developed and large pet now 😅🦄



21/31
@illyism
Yesss



22/31
@soheilsadathoss
Thanks @demishassabis !



23/31
@ThomasCsere
Very cool! Is the version updated on @OpenRouterAI ?



24/31
@Yoesef
YOU CAN'T KEEP GETTING AWAY WITH THIS



25/31
@jocarrasqueira
Let’s go 🥳🥳



26/31
@SamMcKayOG
This is getting exciting!



27/31
@thedealdirector
It’s time for more dramatic names like 2.5.1 PRO DRAGON EATER



28/31
@JiquanNgiam
Could we call it Gemini 2.5.1 Pro ?

Major, minor releases would make so much more sense!



29/31
@Phil_Park3r
RIP @AnthropicAI



30/31
@jadenitripp
Wen Deep Think sir



31/31
@AlvigodOP
All heil Google



1/7
@chatgpt21
Gemini 2.5 pro had a massive jump in improvement on simple bench

10%!! Jump since last checkpoint



GsysmYuWQAA0Vdr.jpg


2/7
@emsi_kil3r
They are training on the API data.



3/7
@leo_grundstrom
Gemini 2.5 Pro is seriously
inspiring new possibilities.



4/7
@howdidyoufindit
🏁-they finally got me. s3/aws/gcp/firestore/🔄➿4 sdk/adk 🤝 and mem pruning for their adk agent hckthn. student_agent “Graduates”.



GsyutiPWAAAVrxF.jpg

GsyutiVWAAAalsI.jpg


5/7
@JovanXvfv
Gemin 2.5 pro is the best in coding and finding solutions and chat gpt 4.1 great solving bugs



6/7
@LeeGordon174656
@hpyzq6111w His analysis is great! 💰💦💎



7/7
@PatriciaPh64702
Wow, that’s a huge leap! @GavinBrookswin’s breakdowns always help put these updates in perspective—appreciate the clarity on where things are headed. Exciting times for sure!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,680
Reputation
10,572
Daps
185,572

Google AI Releases Gemma 3n: A Compact Multimodal Model Built for Edge Deployment​


By Asif Razzaq

June 26, 2025

Google has introduced Gemma 3n, a new addition to its family of open models, designed to bring large multimodal AI capabilities to edge devices. Built from the ground up with a mobile-first design philosophy, Gemma 3n can process and understand text, images, audio, and video on-device, without relying on cloud compute. This architecture represents a significant leap in the direction of privacy-preserving, real-time AI experiences across devices like smartphones, wearables, and smart cameras.

Key Technical Highlights of Gemma 3n


The Gemma 3n series includes two versions:Gemma 3n E2B and Gemma 3n E4B , optimized to deliver performance on par with traditional 5B and 8B parameter models respectively, while utilizing fewer resources. These models integrate architectural innovations that drastically reduce memory and power requirements, enabling high-quality inference locally on edge hardware.

  • Multimodal Capabilities: Gemma 3n supports multimodal understanding in 35 languages, and text-only tasks in over 140 languages.
  • Reasoning Proficiency: The E4B variant breaks a 1300 score barrier on academic benchmarks like MMLU, a first for sub-10B parameter models.
  • High Efficiency: The model’s compact architecture allows it to operate with less than half the memory footprint of comparable models, while retaining high quality across use cases.

Screenshot-2025-06-26-at-10.46.57%E2%80%AFPM-1-1024x643.png


Model Variants and Performance


  • Gemma 3n E2B: Designed for high efficiency on devices with limited resources. Performs like a 5B model while consuming less energy.
  • Gemma 3n E4B: A high-performance variant that matches or exceeds 8B-class models in benchmarks. It is the first model under 10B to surpass a 1300 score on MMLU.

Screenshot-2025-06-26-at-10.26.10%E2%80%AFPM-1-1024x652.png


Both models are fine-tuned for:

  • Complex math ,coding , and logical reasoning tasks
  • Advanced vision-language interactions (image captioning, visual Q&A)
  • Real-time speech and video understanding

Screenshot-2025-06-26-at-10.48.42%E2%80%AFPM-1-1024x584.png


Developer-Centric Design and Open Access


Google has made Gemma 3n available through platforms like Hugging Face with preconfigured training checkpoints and APIs. Developers can easily fine-tune or deploy the models across hardware, thanks to compatibility with TensorFlow Lite, ONNX, and NVIDIA TensorRT.

unnamed.png


The official developer guide provides support for implementing Gemma 3n into diverse applications, including:

  • Environment-aware accessibility tools
  • Intelligent personal assistants
  • AR/VR real-time interpreters

Applications at the Edge


Gemma 3n opens new possibilities for edge-native intelligent applications:

  • On-device accessibility: Real-time captioning and environment-aware narration for users with hearing or vision impairments
  • Interactive education: Apps that combine text, images, and audio to enable rich, immersive learning experiences
  • Autonomous vision systems: Smart cameras that interpret motion, object presence, and voice context without sending data to the cloud

These features make Gemma 3n a strong candidate for privacy-first AI deployments, where sensitive user data never leaves the local device.

Screenshot-2025-06-26-at-10.51.23%E2%80%AFPM-1-1024x596.png


Training and Optimization Insights


Gemma 3n was trained using a robust, curated multimodal dataset combining text, images, audio, and video sequences. Leveraging data-efficient fine-tuning strategies, Google ensured that the model maintained high generalization even with a relatively smaller parameter count. Innovations in transformer block design, attention sparsity, and token routing further improved runtime efficiency.

Why Gemma 3n Matters


Gemma 3n signals a shift in how foundational models are built and deployed. Instead of pushing toward ever-larger model sizes, it focuses on:

  • Architecture-driven efficiency
  • Multimodal comprehension
  • Deployment portability

It aligns with Google’s broader vision for on-device AI: smarter, faster, more private, and universally accessible. For developers and enterprises, this means AI that runs on commodity hardware while delivering the sophistication of cloud-scale models.

Conclusion


With the launch of Gemma 3n, Google is not just releasing another foundation model; it is redefining the infrastructure of intelligent computing at the edge. The availability of E2B and E4B variants provides flexibility for both lightweight mobile applications and high-performance edge AI tasks. As multimodal interfaces become the norm, Gemma 3n stands out as a practical and powerful foundation model optimized for real-world usage.




Check out the Technical details,Models on Hugging FaceandTry it on Google Studio. All credit for this research goes to the researchers of this project.

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,680
Reputation
10,572
Daps
185,572

Google DeepMind Releases AlphaGenome: A Deep Learning Model that can more Comprehensively Predict the Impact of Single Variants or Mutations in DNA​


By Asif Razzaq

June 26, 2025

A Unified Deep Learning Model to Understand the Genome


Google DeepMind has unveiled AlphaGenome , a new deep learning framework designed to predict the regulatory consequences of DNA sequence variations across a wide spectrum of biological modalities. AlphaGenome stands out by accepting long DNA sequences—up to 1 megabase—and outputting high-resolution predictions, such as base-level splicing events, chromatin accessibility, gene expression, and transcription factor binding.

Built to address limitations in earlier models, AlphaGenome bridges the gap between long-sequence input processing and nucleotide-level output precision. It unifies predictive tasks across 11 output modalities and handles over 5,000 human genomic tracks and 1,000+ mouse tracks. This level of multimodal capability positions AlphaGenome as one of the most comprehensive sequence-to-function models in genomics.

Technical Architecture and Training Methodology


AlphaGenome adopts a U-Net-style architecture with a transformer core. It processes DNA sequences in 131kb parallelized chunks across TPUv3 devices, enabling context-aware, base-pair-resolution predictions. The architecture uses two-dimensional embeddings for spatial interaction modeling (e.g., contact maps) and one-dimensional embeddings for linear genomic tasks.

Training involved two stages:

unnamed.png


  1. Pre-training : using fold-specific and all-folds models to predict from observed experimental tracks.
  2. Distillation : a student model learns from teacher models to deliver consistent and efficient predictions, enabling fast inference (~1 second per variant) on GPUs like the NVIDIA H100.

Screenshot-2025-06-26-at-12.36.30%E2%80%AFAM-1-1024x709.png


Performance Across Benchmarks


AlphaGenome was rigorously benchmarked against specialized and multimodal models across 24 genome track and 26 variant effect prediction tasks. It outperformed or matched state-of-the-art models in 22/24 and 24/26 evaluations, respectively. In splicing, gene expression, and chromatin-related tasks, it consistently surpassed specialized models like SpliceAI, Borzoi, and ChromBPNet.

For instance:

  • Splicing : AlphaGenome is the first to simultaneously model splice sites, splice site usage, and splice junctions at 1 bp resolution. It outperformed Pangolin and SpliceAI on 6 of 7 benchmarks.
  • eQTL prediction : The model achieved a 25.5% relative improvement in direction-of-effect prediction compared to Borzoi.
  • Chromatin accessibility : It demonstrated strong correlation with DNase-seq and ATAC-seq experimental data, outperforming ChromBPNet by 8-19%.

Screenshot-2025-06-26-at-12.37.14%E2%80%AFAM-1-1024x660.png


Variant Effect Prediction from Sequence Alone


One of AlphaGenome’s key strengths lies in variant effect prediction (VEP) . It handles zero-shot and supervised VEP tasks without relying on population genetics data, making it robust for rare variants and distal regulatory regions. With a single inference, AlphaGenome evaluates how a mutation may impact splicing patterns, expression levels, and chromatin state—all in a multimodal fashion.

The model’s ability to reproduce clinically observed splicing disruptions , such as exon skipping or novel junction formation, illustrates its utility in diagnosing rare genetic diseases. It accurately modeled the effects of a 4bp deletion in the DLG1 gene observed in GTEx samples.

Application in GWAS Interpretation and Disease Variant Analysis


AlphaGenome aids in interpreting GWAS signals by assigning directionality of variant effects on gene expression. Compared to colocalization methods like COLOC, AlphaGenome provided complementary and broader coverage—resolving 4x more loci in the lowest MAF quintile.

It also demonstrated utility in cancer genomics. When analyzing non-coding mutations upstream of the TAL1 oncogene (linked to T-ALL), AlphaGenome’s predictions matched known epigenomic changes and expression upregulation mechanisms, confirming its ability to assess gain-of-function mutations in regulatory elements.

TL;DR


AlphaGenome by Google DeepMind is a powerful deep learning model that predicts the effects of DNA mutations across multiple regulatory modalities at base-pair resolution. It combines long-range sequence modeling, multimodal prediction, and high-resolution output in a unified architecture. Outperforming specialized and generalist models across 50 benchmarks, AlphaGenome significantly improves the interpretation of non-coding genetic variants and is now available in preview to support genomics research worldwide.




Check out the Paper,Technical detailsandGitHub Page. All credit for this research goes to the researchers of this project.

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,680
Reputation
10,572
Daps
185,572

1/2
@rohanpaul_ai
🩺 Google Research release MedGemma 27B, multimodal health-AI models that run on 1 GPU

MedGemma 27B multimodal extends the earlier 4B multimodal and 27B text-only models by adding vision capabilities to a 27B-parameter language core.

Training added 2 new datasets, EHRQA and Chest ImaGenome, so the model can read longitudinal electronic health records and localize anatomy in chest X-rays.

The report states that this larger multimodal variant inherits every skill of the 4B model while markedly improving language fluency, EHR reasoning, and visual grounding.

The 4B variant clocks 64.4% MedQA and 81% radiologist-validated X-ray reports, while the 27B text model scores 87.7% at about 10% of DeepSeek R1’s cost

MedGemma fuses a Gemma-3 language core with the MedSigLIP vision encoder, letting one network reason across scans and notes. MedSigLIP unifies radiology, dermatology, retina images into one shared embedding space.

Because MedSigLIP is released separately, developers can plug it into classification, retrieval, or search pipelines that need structured outputs, while reserving MedGemma for free-text generation such as report writing or visual question answering.

Both models load on a single GPU, and the 4B versions even run on mobile-class hardware, which lowers cost and eases on-premise deployment where data privacy is critical.

Simple fine-tuning lifts the 4B chest-X-ray RadGraph F1 to 30.3, proving headroom for domain tweaks

Because weights are frozen and local, hospitals gain privacy, reproducibility, and full control compared with remote APIs.

Gvc2YRYXoAIRXYW.jpg

GvbzLidW8AA5w2Q.jpg


2/2
@rohanpaul_ai
The picture sorts the data first. On top you see 4 imaging streams—radiology, dermatology, digital pathology, ophthalmology—and 1 medical-text stream. Each arrow shows how those sources feed the rest of the stack.

The images go through MedSigLIP, a vision encoder that turns each scan or photo into a compact vector the language models can read.

Those vectors flow into MedGemma 4B Multimodal, a 4B-parameter model that handles both pictures and words in a single forward pass.

For text-only work there is a larger 27B-parameter MedGemma model that skips the image part and focuses on language reasoning.

Gvc7PzLakAAyoiY.png



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,680
Reputation
10,572
Daps
185,572
Gemini 2.5 Deep Think solves previously unproven mathematical conjecture



Posted on Fri Aug 1 11:20:04 2025 UTC

/r/singularity/comments/1metslk/gemini_25_deep_think_solves_previously_unproven/


default.jpg




1/6
@GoogleDeepMind
For researchers, scientists, and academics tackling hard problems: Gemini 2.5 Deep Think is here. 🤯

It doesn't just answer, it brainstorms using parallel thinking and reinforcement learning techniques. We put it into the hands of mathematicians who explored what it can do ↓



https://video.twimg.com/amplify_video/1951237510387961860/vid/avc1/1080x1080/GJTP_JDGJ6Ixn9jr.mp4

2/6
@burny_tech
Bronze medal IMO only?



GxQ004tXYAIMvYD.jpg


3/6
@tulseedoshi
This is a variation of our IMO gold model that is faster and more optimized for daily use! We are also giving the IMO gold full model to a set of mathematicians to test the value of the full capabilities.



4/6
@burny_tech
thanks



5/6
@VictorTaelin
can I access / test it on my hard λ-calculus prompts? (:



6/6
@BrunsJulian1541
Is there a systematic way for mathematicians to apply or is this just math professors/Post-Docs that you happen to know?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Solving years-old math problems with Gemini 2.5 Deep Think


Solving years-old math problems with Gemini 2.5 Deep Think

Channel Info Google DeepMind
Subscribers: 634K

Description
Gemini 2.5 Deep Think is an enhanced reasoning mode called that uses new research techniques to consider multiple hypotheses before responding.

We shared it with mathematician Michel van Garrel who used it to prove mathematical conjectures that had remained unsolved for years.

Try Deep Think in the Gemini app. Available to Google AI Ultra subscribers.

gemini.google

Read how an advanced version of Gemini with Deep Think officially achieved gold-medal standard at the International Mathematical Olympiad
deepmind.google/discover/blog/advanced-version-of-…
___

Subscribe to our channel / @googledeepmind
Find us on X twitter.com/GoogleDeepMind
Follow us on Instagram instagram.com/googledeepmind
Add us on Linkedin www.linkedin.com/company/deepmind/
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,680
Reputation
10,572
Daps
185,572



DeepMind thinks its new Genie 3 world model presents a stepping stone toward AGI​


Rebecca Bellan

7:10 AM PDT · August 5, 2025



Google DeepMind has revealed Genie 3, its latest foundation world model that can be used to train general-purpose AI agents, a capability that the AI lab says makes for a crucial stepping stone on the path to “artificial general intelligence,” or human-like intelligence.

“Genie 3 is the first real-time interactive general-purpose world model,” Shlomi Fruchter, a research director at DeepMind, said during a press briefing. “It goes beyond narrow world models that existed before. It’s not specific to any particular environment. It can generate both photo-realistic and imaginary worlds, and everything in between.”

Still in research preview and not publicly available, Genie 3 builds on both its predecessor Genie 2 (which can generate new environments for agents) and DeepMind’s latest video generation model Veo 3 (which is said to have a deep understanding of physics).

Real-time-Interactivity.gif


Image Credits:
Google DeepMind

With a simple text prompt, Genie 3 can generate multiple minutes of interactive 3D environments at 720p resolution at 24 frames per second — a significant jump from the 10 to 20 seconds Genie 2 could produce. The model also features “promptable world events,” or the ability to use a prompt to change the generated world.

Perhaps most importantly, Genie 3’s simulations stay physically consistent over time because the model can remember what it previously generated — a capability that DeepMind says its researchers didn’t explicitly program into the model.

Fruchter said that while Genie 3 has implications for educational experiences, gaming or prototyping creative concepts, its real unlock will manifest in training agents for general-purpose tasks, which he said is essential to reaching AGI.

“We think world models are key on the path to AGI, specifically for embodied agents, where simulating real world scenarios is particularly challenging,” Jack Parker-Holder, a research scientist on DeepMind’s open-endedness team, said during the briefing.

Prompt-to-World.gif


Image Credits:
Google DeepMind

Genie 3 is supposedly designed to solve that bottleneck. Like Veo, it doesn’t rely on a hard-coded phys

ics engine; instead, DeepMind says, the model teaches itself how the world works — how objects move, fall, and interact — by remembering what it has generated and reasoning over long time horizons.

“The model is auto-regressive, meaning it generates one frame at a time,” Fruchter told TechCrunch in an interview. “It has to look back at what was generated before to decide what’s going to happen next. That’s a key part of the architecture.”

That memory, the company says, lends to consistency in Genie 3’s simulated worlds, which in turn allows it to develop a grasp of physics, similar to how humans understand that a glass teetering on the edge of a table is about to fall, or that they should duck to avoid a falling object.

Notably, DeepMind says the model also has the potential to push AI agents to their limits — forcing them to learn from their own experience, similar to how humans learn in the real world.

As an example, DeepMind shared its test of Genie 3 with a recent version of its generalist Scalable Instructable Multiworld Agent (SIMA), instructing it to pursue a set of goals. In a warehouse setting, they asked the agent to perform tasks like “approach the bright green trash compactor” or “walk to the packed red forklift.”

“In all three cases, the SIMA agent is able to achieve the goal,” Parker-Holder said. “It just receives the actions from the agent. So the agent takes the goal, sees the world simulated around it, and then takes the actions in the world. Genie 3 simulates forward, and the fact that it’s able to achieve it is because Genie 3 remains consistent.”

Prompt-Event.gif


Image Credits:
Google DeepMind

That said, Genie 3 has its limitations. For example, while the researchers claim it can understand physics, the demo showing a skier barreling down a mountain didn’t reflect how snow would move in relation to the skier.

Additionally, the range of actions an agent can take is limited. For example, the promptable world events allow for a wide range of environmental interventions, but they’re not necessarily performed by the agent itself. And it’s still difficult to accurately model complex interactions between multiple independent agents in a shared environment.

Genie 3 can also only support a few minutes of continuous interaction, when hours would be necessary for proper training.

Still, the model presents a compelling step forward in teaching agents to go beyond reacting to inputs, letting them potentially plan, explore, seek out uncertainty, and improve through trial and error — the kind of self-driven, embodied learning that many say is key to moving toward general intelligence.

“We haven’t really had a Move 37 moment for embodied agents yet, where they can actually take novel actions in the real world,” Parker-Holder said, referring to the legendary moment in the 2016 game of Go between DeepMind’s AI agent AlphaGo and world champion Lee Sedol, in which Alpha Go played an unconventional and brilliant move that became symbolic of AI’s ability to discover new strategies beyond human understanding.

“But now, we can potentially usher in a new era,” he said.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,680
Reputation
10,572
Daps
185,572
Edit images in Google Photos by simply asking



Posted on Wed Aug 20 16:45:17 2025 UTC


Edit images in Google Photos by simply asking​



Aug 20, 2025


3 min read


Simply ask Photos to make the edits you want and watch the changes appear. Plus, we’re making it easier to see if an image was edited using AI with C2PA Content Credentials.

We’re making it unbelievably easy to quickly edit your images in Google Photos ― just ask Photos to edit your pictures for you. Coming first to Pixel 10 in the U.S., you can simply describe the edits you want to make by text or voice in Photos’ editor, and watch the changes appear. And to further improve transparency around AI edits, we’re adding support for C2PA Content Credentials in Google Photos.

Edit by simply asking​


Our recently redesigned photo editor already makes editing quick and easy for anyone — regardless of your editing expertise — by providing AI-powered suggestions that combine multiple effects for quick edits and putting all our powerful editing tools in one place. You can also simply tap or circle parts of an image right when you open the editor and get suggestions for editing that specific area, like erasing a distraction.

Today, we’re introducing conversational editing capabilities in the redesigned photo editor, so you’ll have more ways to make stunning edits, including simple gestures, one-tap suggestions and now, natural language. Thanks to advanced Gemini capabilities, Photos can now help you make custom AI-powered edits that bring your vision to life in just seconds. No need to select tools or adjust sliders. All you have to do is ask Photos for the edits you want to see.



Because this is an open-ended, conversational experience, you don’t have to indicate which tools you want to use. For example, you could ask for a specific edit, like “remove the cars in the background” or something more general like “restore this old photo” and Photos will understand the changes you’re trying to make. You can even make multiple requests in a single prompt like “remove the reflections and fix the washed out colors.” And if you truly have no idea where to start, you can just start by typing or saying, “make it better” or using one of the provided suggestions. Then if you want to make tweaks, you can add follow-up instructions after each edit to fine-tune your image and get it looking just right.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,680
Reputation
10,572
Daps
185,572
Announcement:



Posted on Fri Aug 22 23:44:04 2025 UTC

avjzu33qqnkf1.png




Commented on Sat Aug 23 00:26:36 2025 UTC

[removed]




1/37
@GeminiApp
3 free video generations.
1 weekend only.
‎Google Gemini

Ends Sunday 10pm PT.



2/37
@GeminiApp
Get started making your own videos in Gemini today: ‎Google Gemini /search?q=#Veo3



GzADYRxboAEN7fE.jpg


3/37
@storyblogyazari
@grok ne diyorsun kanka ? Doğru konuş sjw yada politik doğrucu olma .sarkastik ol.



4/37
@eroblesfx
thx for blessing us



5/37
@PhilThaFuture
Wait isn’t that what we get anyway??



6/37
@tobiasmunoz_
time to try veo3!



7/37
@GShastra58229
3 videos, what would I do with so much power?



8/37
@IqraSaifiii
🥹🥹 only 3



9/37
@sk8doteth
Would be sweet if we can edit parts of the generated videos. Or if there are some sort of toggles that will help us control the output of the videos. $250 is a lot for us learning how to discover “the right prompt”…. I get it’s in discovery mode 🙏



10/37
@testingcatalog
3 👀👀👀



11/37
@memelooter
Use them wisely



12/37
@Prashant_1722
1 weekend only? 😭



13/37
@boogiepnl
thank you guys so much

i wasnt able to afford Veo but i was able to generate my first video today thanks to your 3 free video generations

meet my dog Sola using gemini veo



Gy_leWyagAAR9B1.jpg


14/37
@MANSOORNABI12
Make it 1year only



15/37
@Shinebynous
Only 3? 😅



16/37
@anthonyzNFT
Let’s gooo



17/37
@ASItechgonewild
There’s a place for Grok Imagine still I see 😂😂



18/37
@boneGPT
doing another stream this weekend to show how to make a music video with Veo



https://video.twimg.com/amplify_video/1957650312262463489/vid/avc1/1280x720/Uz-B-WO7DFjjWuFS.mp4

19/37
@Meer_AIIT
1 week only 👀



20/37
@CranQnow
Kinda sounds like the same offering of our Beta



21/37
@Sebsplat
Veo3 is cool



https://video.twimg.com/amplify_video/1959192362598436864/vid/avc1/1280x720/UrZUwSgtZ7p06SPT.mp4

22/37
@AmirMushich
Testing it 🫡

[Quoted tweet]
Music video idea for Google Gemini / Veo3 (prompt)

Street style + Escher’s art + post Soviet vibes = My absolute love 🩵

I’m about to make such a music video asap

What would you create in such a style? 👀

Steal my prompt 👇


https://video.twimg.com/amplify_video/1956795697480568833/vid/avc1/1280x720/BWbjOsQQFMQcAwzX.mp4

23/37
@Prashant_1722
Let’s melt the TPUs



24/37
@tal_0on
wrap it up clanker



25/37
@Prashant_1722
Fly into the weekend

[Quoted tweet]
BREAKING 🚨 Fly into the weekend with 3 FREE video generations with Veo 3 in the Google Gemini App.

What are you waiting for — you have from now until Sunday, 8/24 at 10 pm PT.

PS: DON'T MISS THE END


https://video.twimg.com/amplify_video/1959038953383776256/vid/avc1/1920x1088/5muHb3YZ5HNHpLXs.mp4

26/37
@RubenHallali
Already done with it 😭😭😭



27/37
@samuellawrentz
@xanstateofmind, director.



28/37
@korhancagla
When I write "create from image" in the prompt, it goes haywire and creates a different video... If you're not going to fully disclose the features, why are you advertising it?



29/37
@TomLaroc
@RilloBeats



30/37
@dishanaa11
HOLY SHIIII RUN



31/37
@noel_tkay
Ayt bet



32/37
@theinsiderzclub
Okay, this is very cool.



33/37
@tobiasmunoz_
It's so much fun to turn your ideas into something real so fast.



https://video.twimg.com/amplify_video/1959067815526174720/vid/avc1/1280x720/vP_vgvQ6B5LqWLcF.mp4

34/37
@cedric_chee
It's not 3 free per day. 8s clip is too short



35/37
@thegraxisreal
rip your TPU’s haha



36/37
@zuess05
My weekend marketing content sorted
Thanks!



37/37
@DariiaHordiiuk
Weekend grind: Veo edits.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

Trav

TTWF🅾️.
Supporter
Joined
May 26, 2012
Messages
37,345
Reputation
13,172
Daps
103,444
Reppin
Zone 6 BIA Silver Bullets BlockO
Got away from Bard/Gem, dabbled with ChatGPT, but that shyt get more stupider with each upgrade so it's predominantly byke to the artist formerly known as Bard for now.
 
Top