bnew

Veteran
Joined
Nov 1, 2015
Messages
64,752
Reputation
9,875
Daps
175,660
Leaked AI Technology Making Large Language Models Obsolete!


https://inv.nadeko.net/watch?__goaw...ttps://inv.nadeko.net/&hl=en-US&v=V8xAQrdeGoo

Channel Info Pourya Kordi
Subscribers: 9.51K

Description
In this video we talked about several leaked AI technologies from major labs, some of them revealed by the researchers and some introduced by companies but at the conceptual level while the actual recipe is hidden.

This video covers the latest developments in the field of artificial intelligence, particularly focusing on the rapid ai development and the future of ai. The discussion includes advancements in large language models and their potential impact on various industries. Stay informed about the latest ai news and predictions.

0:00 Introduction
0:44 Sub-Quadratic
6:12 Hidden Thought Process and JEPA
14:04 Self-Play and Self-Evolution Tech
16:45 Gemini's Ultimate Goal

***************
All materials in these videos are used for educational purposes and fall within the guidelines of fair use. No copyright infringement is intended. If you are or represent the copyright owner of materials used in this video and have a problem with the use of said material, please contact me via my email in the "about" page on my channel.

**************


Transcripts

Show transcript
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
64,752
Reputation
9,875
Daps
175,660

Meta’s V-JEPA 2 model teaches AI to understand its surroundings​


Amanda Silberling

8:54 AM PDT · June 11, 2025



Meta on Wednesday unveiled its new V-JEPA 2 AI model, a “world model” that is designed to help AI agents understand the world around them.

V-JEPA 2 is an extension of the V-JEPA model that Meta released last year, which was trained on over 1 million hours of video. This training data is supposed to help robots or other AI agents operate in the physical world, understanding and predicting how concepts like gravity will impact what happens next in a sequence.

These are the kinds of common sense connections that small children and animals make as their brains develop — when you play fetch with a dog, for example, the dog will (hopefully) understand how bouncing a ball on the ground will cause it to rebound upward, or how it should run toward where it thinks the ball will land, and not where the ball is at that precise moment.

Meta depicts examples where a robot may be confronted with, for example, the point-of-view of holding a plate and a spatula and walking toward a stove with cooked eggs. The AI can predict that a very likely next action would be to use the spatula to move the eggs to the plate.

According to Meta, V-JEPA 2 is 30x faster than Nvidia’s Cosmos model, which also tries to enhance intelligence related to the physical world. However, Meta may be evaluating its own models according to different benchmarks than Nvidia.

“We believe world models will usher a new era for robotics, enabling real-world AI agents to help with chores and physical tasks without needing astronomical amounts of robotic training data,” explained Meta’s chief AI scientist Yann LeCun in a video.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
64,752
Reputation
9,875
Daps
175,660

The AI leaders bringing the AGI debate down to Earth​


Maxwell Zeff

8:00 AM PDT · March 19, 2025



During a recent dinner with business leaders in San Francisco, a comment I made cast a chill over the room. I hadn’t asked my dining companions anything I considered to be extremely faux pas: simply whether they thought today’s AI could someday achieve human-like intelligence (i.e. AGI) or beyond.

It’s a more controversial topic than you might think.

In 2025, there’s no shortage of tech CEOs offering the bull case for how large language models (LLMs), which power chatbots like ChatGPT and Gemini, could attain human-level or even super-human intelligence over the near term. These executives argue that highly capable AI will bring about widespread — and widely distributed — societal benefits.

For example, Dario Amodei, Anthropic’s CEO, wrote in an essay that exceptionally powerful AI could arrive as soon as 2026 and be “smarter than a Nobel Prize winner across most relevant fields.” Meanwhile, OpenAI CEO Sam Altman recently claimed his company knows how to build “superintelligent” AI, and predicted it may “massively accelerate scientific discovery.

However, not everyone finds these optimistic claims convincing.

Other AI leaders are skeptical that today’s LLMs can reach AGI — much less superintelligence — barring some novel innovations. These leaders have historically kept a low profile, but more have begun to speak up recently.

In a piece this month, Thomas Wolf, Hugging Face’s co-founder and chief science officer, called some parts of Amodei’s vision “wishful thinking at best.” Informed by his PhD research in statistical and quantum physics, Wolf thinks that Nobel Prize-level breakthroughs don’t come from answering known questions — something that AI excels at — but rather from asking questions no one has thought to ask.

In Wolf’s opinion, today’s LLMs aren’t up to the task.

“I would love to see this ‘Einstein model’ out there, but we need to dive into the details of how to get there,” Wolf told TechCrunch in an interview. “That’s where it starts to be interesting.”

Wolf said he wrote the piece because he felt there was too much hype about AGI, and not enough serious evaluation of how to actually get there. He thinks that, as things stand, there’s a real possibility AI transforms the world in the near future, but doesn’t achieve human-level intelligence or superintelligence.

Much of the AI world has become enraptured by the promise of AGI. Those who don’t believe it’s possible are often labeled as “anti-technology,” or otherwise bitter and misinformed.

Some might peg Wolf as a pessimist for this view, but Wolf thinks of himself as an “informed optimist” — someone who wants to push AI forward without losing grasp of reality. Certainly, he isn’t the only AI leader with conservative predictions about the technology.

Google DeepMind CEO Demis Hassabis has reportedly told staff that, in his opinion, the industry could be up to a decade away from developing AGI — noting there are a lot of tasks AI simply can’t do today. Meta Chief AI Scientist Yann LeCun has also expressed doubts about the potential of LLMs. Speaking at Nvidia GTC on Tuesday, LeCun said the idea that LLMs could achieve AGI was “nonsense,” and called for entirely new architectures to serve as bedrocks for superintelligence.

Kenneth Stanley, a former OpenAI lead researcher, is one of the people digging into the details of how to build advanced AI with today’s models. He’s now an executive at Lila Sciences, a new startup that raised $200 million in venture capital to unlock scientific innovation via automated labs.

Stanley spends his days trying to extract original, creative ideas from AI models, a subfield of AI research called open-endedness. Lila Sciences aims to create AI models that can automate the entire scientific process, including the very first step — arriving at really good questions and hypotheses that would ultimately lead to breakthroughs.

“I kind of wish I had written [Wolf’s] essay, because it really reflects my feelings,” Stanley said in an interview with TechCrunch. “What [he] noticed was that being extremely knowledgeable and skilled did not necessarily lead to having really original ideas.”

Stanley believes that creativity is a key step along the path to AGI, but notes that building a “creative” AI model is easier said than done.

Optimists like Amodei point to methods such as AI “reasoning” models, which use more computing power to fact-check their work and correctly answer certain questions more consistently, as evidence that AGI isn’t terribly far away. Yet coming up with original ideas and questions may require a different kind of intelligence, Stanley says.

“If you think about it, reasoning is almost antithetical to [creativity],” he added. “Reasoning models say, ‘Here’s the goal of the problem, let’s go directly towards that goal,’ which basically stops you from being opportunistic and seeing things outside of that goal, so that you can then diverge and have lots of creative ideas.”

To design truly intelligent AI models, Stanley suggests we need to algorithmically replicate a human’s subjective taste for promising new ideas. Today’s AI models perform quite well in academic domains with clear-cut answers, such as math and programming. However, Stanley points out that it’s much harder to design an AI model for more subjective tasks that require creativity, which don’t necessarily have a “correct” answer.

“People shy away from [subjectivity] in science — the word is almost toxic,” Stanley said. “But there’s nothing to prevent us from dealing with subjectivity [algorithmically]. It’s just part of the data stream.”

Stanley says he’s glad that the field of open-endedness is getting more attention now, with dedicated research labs at Lila Sciences, Google DeepMind, and AI startup Sakana now working on the problem. He’s starting to see more people talk about creativity in AI, he says — but he thinks that there’s a lot more work to be done.

Wolf and LeCun would probably agree. Call them the AI realists, if you will: AI leaders approaching AGI and superintelligence with serious, grounded questions about its feasibility. Their goal isn’t to poo-poo advances in the AI field. Rather, it’s to kick-start big-picture conversation about what’s standing between AI models today and AGI — and super-intelligence — and to go after those blockers.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
64,752
Reputation
9,875
Daps
175,660
[News] Meta releases V-JEPA 2, the first world model trained on video



Posted on Wed Jun 11 14:48:35 2025 UTC


Meta: Introducing the V-JEPA 2 world model and new benchmarks for physical reasoning







1/6
@TheTuringPost
11 Types of JEPA you should know about:

▪️ V-JEPA 2
▪️ Time-Series-JEPA (TS-JEPA)
▪️ Denoising JEPA (D-JEPA)
▪️ CNN-JEPA
▪️ Stem-JEPA
▪️ DMT-JEPA
▪️ seq-JEPA
▪️ AD-L-JEPA
▪️ SAR-JEPA
▪️ HEP-JEPA
▪️ ECG-JEPA

JEPA by @ylecun and other researchers from Meta is a self-supervised learning framework that predicts the latent representation of a missing part of the input. It's really worth learning more about 👇

Check this out for more info and useful resources: @Kseniase on Hugging Face: "11 Types of JEPA Since Meta released the newest V-JEPA 2 this week, we…"



Gteu4lkaQAAmwQH.jpg


2/6
@TheTuringPost
Other interesting JEPA types:

[Quoted tweet]
12 types of JEPA (Joint-Embedding Predictive Architecture)

▪️ I-JEPA
▪️ MC-JEPA
▪️ V-JEPA
▪️ UI-JEPA
▪️ A-JEPA (Audio-based JEPA)
▪️ S-JEPA
▪️ TI-JEPA
▪️ T-JEPA
▪️ ACT-JEPA
▪️ Brain-JEPA
▪️ 3D-JEPA
▪️ Point-JEPA

Save the list and check this out for the links and more info: huggingface.co/posts/Ksenias…


Gryj8RpWUAAd1Ff.jpg


3/6
@Jacoed
Nah thanks



4/6
@HrishbhDalal
working on one more 😉



5/6
@xzai259
No thank you.



6/6
@ThinkDi92468945
Does it achieve SOTA on any benchmarks?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196











1/9
@TheTuringPost
A new comer in the family of world models — V-JEPA-2

By combining 1M+ hours of internet videos and a little bit of robot interaction data, @AIatMeta built an AI that can:

• Watch
• Understand
• Answer questions
• Help robots plan and act in physical world

V-JEPA 2 shows true success of self-supervised learning and efficient scaling of everything.

Here is how it actually works:

[Quoted tweet]
Introducing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction.

V-JEPA 2 can enable zero-shot planning in robots—allowing them to plan and execute tasks in unfamiliar environments.

Download V-JEPA 2 and read our research paper ➡️ ai.meta.com/vjepa/


GtPkb0ja8AA0dSh.jpg


2/9
@TheTuringPost
1. How does V-JEPA 2 excel at understanding motion and predicting?

The researchers first trained Video Joint Embedding Predictive Architecture 2 (V-JEPA 2) on over 1 million hours of video from the internet.

The strategy was - mask and predict:

• Encoder – turns the visible parts of the video into representations.
• Predictor – uses those representations to predict the masked parts.

So V-JEPA 2 learns from them without knowing what actions are being taken.



GtPkcuibYAAOMfE.png


3/9
@TheTuringPost
2. Another smart strategy is to scale up everything:

• Much more training data: 2 million → 22 million videos
• Bigger model: 300 million → 1 billion + parameter encoder
• Longer training: 90K → 252K steps
• Higher video resolution and clip length

This all helped to improve the performance



GtPkdo5bsAAO02w.jpg


4/9
@TheTuringPost
3. From watching to acting: V-JEPA 2-AC

“AC” stands for Action-Conditioned. This stage teaches the model to reason about actions, not just observations.

- The researchers keep the original V-JEPA 2 frozen.
- They add a new predictor on top that takes into account both what the robot sees and what actions it takes.



GtPkejhaAAAUxyi.png


5/9
@TheTuringPost
4. Once trained, V-JEPA 2 can be used for planning and performing actions:

- The robot is given a goal image — what the scene should look like after it succeeds.
- The model processes its current state — frame and arm position.
- Then it tries out different possible action sequences and imagines what the result will be.
- It picks the sequence that gets its prediction closest to the goal image.
- It executes only the first action, then repeats the process step-by-step — this is called receding horizon control.



GtPkfeWa0AAAA2j.jpg


6/9
@TheTuringPost
5. Zero-shot robot manipulation:

Trained with only raw 62 hours of unlabeled robot data from a robot arm, V-JEPA 2 achieves:

- 100% success in reach tasks
- Up to 80% success in pick-and-place tasks, even with new objects and cluttered scenes

This is what makes V-JEPA 2 self-supervised



GtPkgXwb0AIUXT5.jpg


7/9
@TheTuringPost
6. Other capabilities:

Understanding: 77.3% SSv2 accuracy, state-of-the-art VidQA
Prediction: 39.7 recall@5 on Epic-Kitchens-100



GtPkhYSbQAAL37k.jpg

GtPkhieagAAlfgZ.jpg


8/9
@TheTuringPost
Paper: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning | Research - AI at Meta

Meta's blog post: Introducing V-JEPA 2



9/9
@AkcayGok36003
“You’re killing the game!”
@miller_elio 🎈 🎈 💰 💰




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
64,752
Reputation
9,875
Daps
175,660






1/6
@mixpeek
Multimodal Monday #12 just dropped!

Quick Hits:
- V-JEPA 2: Meta's new world model enhances visual understanding and robotic intelligence with self-supervised learning

- LEANN: Shrinks vector indexing to under 5% storage for local search

- Apple announces new Foundation Models add vision across 15 languages

- DatologyAI CLIP: Boosts efficiency 8x via smarter data curation

🧵👇



GtkpwI8XcAAJOfE.jpg


2/6
@mixpeek
🧠 Research Spotlight:
- ViGaL: Arcade games like Snake boost multimodal math reasoning

- RCTS: Tree search enhances multimodal RAG with reliable answers

- CLaMR: Contextualized late-interaction improves multimodal content retrieval

- SAM2.1++: Distractor-aware memory lifts tracking accuracy on 6/7 benchmarks

- SAM2 Tracking: Introspection strategy boosts segmentation robustness

Read our full newsletter for more on these and other research highlights



https://video.twimg.com/amplify_video/1934624445517144064/vid/avc1/2786x1180/bfNq8iqIHvpS9TFu.mp4

3/6
@mixpeek
🛠 Tools to Watch:
- V-JEPA 2: World model predicts video states for improved robotic/visual understanding

- Apple Foundation Models: 3B on-device model with 15-language vision

- NVIDIA GEN3C: 3D-aware video generation with precise camera control

- Optimus-3: Generalist agents for Minecraft environments

- Sound-Vision: Explores sound replacing vision in LLaVA



https://video.twimg.com/amplify_video/1934625100168675329/vid/avc1/1920x1080/lGucmo-HKsMI2FAv.mp4

4/6
@mixpeek
🛠 More Tools to Watch:
- Text-to-LoRA: Generates LoRA adapters from text descriptions

- Implicit Semantics: Embeddings capture intent and context

- TempVS Benchmark: Tests temporal order in image sequences



https://video.twimg.com/amplify_video/1934625434206920704/vid/avc1/2160x960/VQNnfLWMhtZf0u-3.mp4

5/6
@mixpeek
🧩 Community Buzz:
- JEPA Evolution: TuringPost covers 11+ variants across domains - https://nitter.poast.org/TheTuringPost/status/1934206858736382388

- Ming-lite-omni: Open-source GPT-4o rival now accessible

- NVIDIA GEN3C: Apache 2.0 licensed 3D video generation

[Quoted tweet]
11 Types of JEPA you should know about:

▪️ V-JEPA 2
▪️ Time-Series-JEPA (TS-JEPA)
▪️ Denoising JEPA (D-JEPA)
▪️ CNN-JEPA
▪️ Stem-JEPA
▪️ DMT-JEPA
▪️ seq-JEPA
▪️ AD-L-JEPA
▪️ SAR-JEPA
▪️ HEP-JEPA
▪️ ECG-JEPA

JEPA by @ylecun and other researchers from Meta is a self-supervised learning framework that predicts the latent representation of a missing part of the input. It's really worth learning more about 👇

Check this out for more info and useful resources: huggingface.co/posts/Ksenias…


Gteu4lkaQAAmwQH.jpg


https://video.twimg.com/amplify_video/1934626405729329152/vid/avc1/1280x720/bIXWiTqFLkLuGYnG.mp4

6/6
@mixpeek
Dive into the full newsletter for more updates! Multimodal Monday #12: World Models, Efficiency Increases | Mixpeek

Signup to get the latest multimodal AI updates every week: https://mixpeek.com/blog



Gtks1gzXQAI4Es8.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top