bnew

Veteran
Joined
Nov 1, 2015
Messages
66,009
Reputation
10,204
Daps
178,977
Leaked AI Technology Making Large Language Models Obsolete!


https://inv.nadeko.net/watch?__goaw...ttps://inv.nadeko.net/&hl=en-US&v=V8xAQrdeGoo

Channel Info Pourya Kordi
Subscribers: 9.51K

Description
In this video we talked about several leaked AI technologies from major labs, some of them revealed by the researchers and some introduced by companies but at the conceptual level while the actual recipe is hidden.

This video covers the latest developments in the field of artificial intelligence, particularly focusing on the rapid ai development and the future of ai. The discussion includes advancements in large language models and their potential impact on various industries. Stay informed about the latest ai news and predictions.

0:00 Introduction
0:44 Sub-Quadratic
6:12 Hidden Thought Process and JEPA
14:04 Self-Play and Self-Evolution Tech
16:45 Gemini's Ultimate Goal

***************
All materials in these videos are used for educational purposes and fall within the guidelines of fair use. No copyright infringement is intended. If you are or represent the copyright owner of materials used in this video and have a problem with the use of said material, please contact me via my email in the "about" page on my channel.

**************


Transcripts

Show transcript
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
66,009
Reputation
10,204
Daps
178,977

Meta’s V-JEPA 2 model teaches AI to understand its surroundings​


Amanda Silberling

8:54 AM PDT · June 11, 2025



Meta on Wednesday unveiled its new V-JEPA 2 AI model, a “world model” that is designed to help AI agents understand the world around them.

V-JEPA 2 is an extension of the V-JEPA model that Meta released last year, which was trained on over 1 million hours of video. This training data is supposed to help robots or other AI agents operate in the physical world, understanding and predicting how concepts like gravity will impact what happens next in a sequence.

These are the kinds of common sense connections that small children and animals make as their brains develop — when you play fetch with a dog, for example, the dog will (hopefully) understand how bouncing a ball on the ground will cause it to rebound upward, or how it should run toward where it thinks the ball will land, and not where the ball is at that precise moment.

Meta depicts examples where a robot may be confronted with, for example, the point-of-view of holding a plate and a spatula and walking toward a stove with cooked eggs. The AI can predict that a very likely next action would be to use the spatula to move the eggs to the plate.

According to Meta, V-JEPA 2 is 30x faster than Nvidia’s Cosmos model, which also tries to enhance intelligence related to the physical world. However, Meta may be evaluating its own models according to different benchmarks than Nvidia.

“We believe world models will usher a new era for robotics, enabling real-world AI agents to help with chores and physical tasks without needing astronomical amounts of robotic training data,” explained Meta’s chief AI scientist Yann LeCun in a video.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
66,009
Reputation
10,204
Daps
178,977

The AI leaders bringing the AGI debate down to Earth​


Maxwell Zeff

8:00 AM PDT · March 19, 2025



During a recent dinner with business leaders in San Francisco, a comment I made cast a chill over the room. I hadn’t asked my dining companions anything I considered to be extremely faux pas: simply whether they thought today’s AI could someday achieve human-like intelligence (i.e. AGI) or beyond.

It’s a more controversial topic than you might think.

In 2025, there’s no shortage of tech CEOs offering the bull case for how large language models (LLMs), which power chatbots like ChatGPT and Gemini, could attain human-level or even super-human intelligence over the near term. These executives argue that highly capable AI will bring about widespread — and widely distributed — societal benefits.

For example, Dario Amodei, Anthropic’s CEO, wrote in an essay that exceptionally powerful AI could arrive as soon as 2026 and be “smarter than a Nobel Prize winner across most relevant fields.” Meanwhile, OpenAI CEO Sam Altman recently claimed his company knows how to build “superintelligent” AI, and predicted it may “massively accelerate scientific discovery.

However, not everyone finds these optimistic claims convincing.

Other AI leaders are skeptical that today’s LLMs can reach AGI — much less superintelligence — barring some novel innovations. These leaders have historically kept a low profile, but more have begun to speak up recently.

In a piece this month, Thomas Wolf, Hugging Face’s co-founder and chief science officer, called some parts of Amodei’s vision “wishful thinking at best.” Informed by his PhD research in statistical and quantum physics, Wolf thinks that Nobel Prize-level breakthroughs don’t come from answering known questions — something that AI excels at — but rather from asking questions no one has thought to ask.

In Wolf’s opinion, today’s LLMs aren’t up to the task.

“I would love to see this ‘Einstein model’ out there, but we need to dive into the details of how to get there,” Wolf told TechCrunch in an interview. “That’s where it starts to be interesting.”

Wolf said he wrote the piece because he felt there was too much hype about AGI, and not enough serious evaluation of how to actually get there. He thinks that, as things stand, there’s a real possibility AI transforms the world in the near future, but doesn’t achieve human-level intelligence or superintelligence.

Much of the AI world has become enraptured by the promise of AGI. Those who don’t believe it’s possible are often labeled as “anti-technology,” or otherwise bitter and misinformed.

Some might peg Wolf as a pessimist for this view, but Wolf thinks of himself as an “informed optimist” — someone who wants to push AI forward without losing grasp of reality. Certainly, he isn’t the only AI leader with conservative predictions about the technology.

Google DeepMind CEO Demis Hassabis has reportedly told staff that, in his opinion, the industry could be up to a decade away from developing AGI — noting there are a lot of tasks AI simply can’t do today. Meta Chief AI Scientist Yann LeCun has also expressed doubts about the potential of LLMs. Speaking at Nvidia GTC on Tuesday, LeCun said the idea that LLMs could achieve AGI was “nonsense,” and called for entirely new architectures to serve as bedrocks for superintelligence.

Kenneth Stanley, a former OpenAI lead researcher, is one of the people digging into the details of how to build advanced AI with today’s models. He’s now an executive at Lila Sciences, a new startup that raised $200 million in venture capital to unlock scientific innovation via automated labs.

Stanley spends his days trying to extract original, creative ideas from AI models, a subfield of AI research called open-endedness. Lila Sciences aims to create AI models that can automate the entire scientific process, including the very first step — arriving at really good questions and hypotheses that would ultimately lead to breakthroughs.

“I kind of wish I had written [Wolf’s] essay, because it really reflects my feelings,” Stanley said in an interview with TechCrunch. “What [he] noticed was that being extremely knowledgeable and skilled did not necessarily lead to having really original ideas.”

Stanley believes that creativity is a key step along the path to AGI, but notes that building a “creative” AI model is easier said than done.

Optimists like Amodei point to methods such as AI “reasoning” models, which use more computing power to fact-check their work and correctly answer certain questions more consistently, as evidence that AGI isn’t terribly far away. Yet coming up with original ideas and questions may require a different kind of intelligence, Stanley says.

“If you think about it, reasoning is almost antithetical to [creativity],” he added. “Reasoning models say, ‘Here’s the goal of the problem, let’s go directly towards that goal,’ which basically stops you from being opportunistic and seeing things outside of that goal, so that you can then diverge and have lots of creative ideas.”

To design truly intelligent AI models, Stanley suggests we need to algorithmically replicate a human’s subjective taste for promising new ideas. Today’s AI models perform quite well in academic domains with clear-cut answers, such as math and programming. However, Stanley points out that it’s much harder to design an AI model for more subjective tasks that require creativity, which don’t necessarily have a “correct” answer.

“People shy away from [subjectivity] in science — the word is almost toxic,” Stanley said. “But there’s nothing to prevent us from dealing with subjectivity [algorithmically]. It’s just part of the data stream.”

Stanley says he’s glad that the field of open-endedness is getting more attention now, with dedicated research labs at Lila Sciences, Google DeepMind, and AI startup Sakana now working on the problem. He’s starting to see more people talk about creativity in AI, he says — but he thinks that there’s a lot more work to be done.

Wolf and LeCun would probably agree. Call them the AI realists, if you will: AI leaders approaching AGI and superintelligence with serious, grounded questions about its feasibility. Their goal isn’t to poo-poo advances in the AI field. Rather, it’s to kick-start big-picture conversation about what’s standing between AI models today and AGI — and super-intelligence — and to go after those blockers.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
66,009
Reputation
10,204
Daps
178,977
[News] Meta releases V-JEPA 2, the first world model trained on video



Posted on Wed Jun 11 14:48:35 2025 UTC


Meta: Introducing the V-JEPA 2 world model and new benchmarks for physical reasoning







1/6
@TheTuringPost
11 Types of JEPA you should know about:

▪️ V-JEPA 2
▪️ Time-Series-JEPA (TS-JEPA)
▪️ Denoising JEPA (D-JEPA)
▪️ CNN-JEPA
▪️ Stem-JEPA
▪️ DMT-JEPA
▪️ seq-JEPA
▪️ AD-L-JEPA
▪️ SAR-JEPA
▪️ HEP-JEPA
▪️ ECG-JEPA

JEPA by @ylecun and other researchers from Meta is a self-supervised learning framework that predicts the latent representation of a missing part of the input. It's really worth learning more about 👇

Check this out for more info and useful resources: @Kseniase on Hugging Face: "11 Types of JEPA Since Meta released the newest V-JEPA 2 this week, we…"



Gteu4lkaQAAmwQH.jpg


2/6
@TheTuringPost
Other interesting JEPA types:

[Quoted tweet]
12 types of JEPA (Joint-Embedding Predictive Architecture)

▪️ I-JEPA
▪️ MC-JEPA
▪️ V-JEPA
▪️ UI-JEPA
▪️ A-JEPA (Audio-based JEPA)
▪️ S-JEPA
▪️ TI-JEPA
▪️ T-JEPA
▪️ ACT-JEPA
▪️ Brain-JEPA
▪️ 3D-JEPA
▪️ Point-JEPA

Save the list and check this out for the links and more info: huggingface.co/posts/Ksenias…


Gryj8RpWUAAd1Ff.jpg


3/6
@Jacoed
Nah thanks



4/6
@HrishbhDalal
working on one more 😉



5/6
@xzai259
No thank you.



6/6
@ThinkDi92468945
Does it achieve SOTA on any benchmarks?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196











1/9
@TheTuringPost
A new comer in the family of world models — V-JEPA-2

By combining 1M+ hours of internet videos and a little bit of robot interaction data, @AIatMeta built an AI that can:

• Watch
• Understand
• Answer questions
• Help robots plan and act in physical world

V-JEPA 2 shows true success of self-supervised learning and efficient scaling of everything.

Here is how it actually works:

[Quoted tweet]
Introducing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction.

V-JEPA 2 can enable zero-shot planning in robots—allowing them to plan and execute tasks in unfamiliar environments.

Download V-JEPA 2 and read our research paper ➡️ ai.meta.com/vjepa/


GtPkb0ja8AA0dSh.jpg


2/9
@TheTuringPost
1. How does V-JEPA 2 excel at understanding motion and predicting?

The researchers first trained Video Joint Embedding Predictive Architecture 2 (V-JEPA 2) on over 1 million hours of video from the internet.

The strategy was - mask and predict:

• Encoder – turns the visible parts of the video into representations.
• Predictor – uses those representations to predict the masked parts.

So V-JEPA 2 learns from them without knowing what actions are being taken.



GtPkcuibYAAOMfE.png


3/9
@TheTuringPost
2. Another smart strategy is to scale up everything:

• Much more training data: 2 million → 22 million videos
• Bigger model: 300 million → 1 billion + parameter encoder
• Longer training: 90K → 252K steps
• Higher video resolution and clip length

This all helped to improve the performance



GtPkdo5bsAAO02w.jpg


4/9
@TheTuringPost
3. From watching to acting: V-JEPA 2-AC

“AC” stands for Action-Conditioned. This stage teaches the model to reason about actions, not just observations.

- The researchers keep the original V-JEPA 2 frozen.
- They add a new predictor on top that takes into account both what the robot sees and what actions it takes.



GtPkejhaAAAUxyi.png


5/9
@TheTuringPost
4. Once trained, V-JEPA 2 can be used for planning and performing actions:

- The robot is given a goal image — what the scene should look like after it succeeds.
- The model processes its current state — frame and arm position.
- Then it tries out different possible action sequences and imagines what the result will be.
- It picks the sequence that gets its prediction closest to the goal image.
- It executes only the first action, then repeats the process step-by-step — this is called receding horizon control.



GtPkfeWa0AAAA2j.jpg


6/9
@TheTuringPost
5. Zero-shot robot manipulation:

Trained with only raw 62 hours of unlabeled robot data from a robot arm, V-JEPA 2 achieves:

- 100% success in reach tasks
- Up to 80% success in pick-and-place tasks, even with new objects and cluttered scenes

This is what makes V-JEPA 2 self-supervised



GtPkgXwb0AIUXT5.jpg


7/9
@TheTuringPost
6. Other capabilities:

Understanding: 77.3% SSv2 accuracy, state-of-the-art VidQA
Prediction: 39.7 recall@5 on Epic-Kitchens-100



GtPkhYSbQAAL37k.jpg

GtPkhieagAAlfgZ.jpg


8/9
@TheTuringPost
Paper: V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning | Research - AI at Meta

Meta's blog post: Introducing V-JEPA 2



9/9
@AkcayGok36003
“You’re killing the game!”
@miller_elio 🎈 🎈 💰 💰




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
66,009
Reputation
10,204
Daps
178,977






1/6
@mixpeek
Multimodal Monday #12 just dropped!

Quick Hits:
- V-JEPA 2: Meta's new world model enhances visual understanding and robotic intelligence with self-supervised learning

- LEANN: Shrinks vector indexing to under 5% storage for local search

- Apple announces new Foundation Models add vision across 15 languages

- DatologyAI CLIP: Boosts efficiency 8x via smarter data curation

🧵👇



GtkpwI8XcAAJOfE.jpg


2/6
@mixpeek
🧠 Research Spotlight:
- ViGaL: Arcade games like Snake boost multimodal math reasoning

- RCTS: Tree search enhances multimodal RAG with reliable answers

- CLaMR: Contextualized late-interaction improves multimodal content retrieval

- SAM2.1++: Distractor-aware memory lifts tracking accuracy on 6/7 benchmarks

- SAM2 Tracking: Introspection strategy boosts segmentation robustness

Read our full newsletter for more on these and other research highlights



https://video.twimg.com/amplify_video/1934624445517144064/vid/avc1/2786x1180/bfNq8iqIHvpS9TFu.mp4

3/6
@mixpeek
🛠 Tools to Watch:
- V-JEPA 2: World model predicts video states for improved robotic/visual understanding

- Apple Foundation Models: 3B on-device model with 15-language vision

- NVIDIA GEN3C: 3D-aware video generation with precise camera control

- Optimus-3: Generalist agents for Minecraft environments

- Sound-Vision: Explores sound replacing vision in LLaVA



https://video.twimg.com/amplify_video/1934625100168675329/vid/avc1/1920x1080/lGucmo-HKsMI2FAv.mp4

4/6
@mixpeek
🛠 More Tools to Watch:
- Text-to-LoRA: Generates LoRA adapters from text descriptions

- Implicit Semantics: Embeddings capture intent and context

- TempVS Benchmark: Tests temporal order in image sequences



https://video.twimg.com/amplify_video/1934625434206920704/vid/avc1/2160x960/VQNnfLWMhtZf0u-3.mp4

5/6
@mixpeek
🧩 Community Buzz:
- JEPA Evolution: TuringPost covers 11+ variants across domains - https://nitter.poast.org/TheTuringPost/status/1934206858736382388

- Ming-lite-omni: Open-source GPT-4o rival now accessible

- NVIDIA GEN3C: Apache 2.0 licensed 3D video generation

[Quoted tweet]
11 Types of JEPA you should know about:

▪️ V-JEPA 2
▪️ Time-Series-JEPA (TS-JEPA)
▪️ Denoising JEPA (D-JEPA)
▪️ CNN-JEPA
▪️ Stem-JEPA
▪️ DMT-JEPA
▪️ seq-JEPA
▪️ AD-L-JEPA
▪️ SAR-JEPA
▪️ HEP-JEPA
▪️ ECG-JEPA

JEPA by @ylecun and other researchers from Meta is a self-supervised learning framework that predicts the latent representation of a missing part of the input. It's really worth learning more about 👇

Check this out for more info and useful resources: huggingface.co/posts/Ksenias…


Gteu4lkaQAAmwQH.jpg


https://video.twimg.com/amplify_video/1934626405729329152/vid/avc1/1280x720/bIXWiTqFLkLuGYnG.mp4

6/6
@mixpeek
Dive into the full newsletter for more updates! Multimodal Monday #12: World Models, Efficiency Increases | Mixpeek

Signup to get the latest multimodal AI updates every week: https://mixpeek.com/blog



Gtks1gzXQAI4Es8.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
66,009
Reputation
10,204
Daps
178,977
[Article] OpenAI Discovers "Misaligned Persona" Pattern That Controls AI Misbehavior



Posted on Thu Jun 19 05:49:52 2025 UTC

/r/OpenAI/comments/1lf3695/openai_discovers_misaligned_persona_pattern_that/

OpenAI just published research on "emergent misalignment" - a phenomenon where training AI models to give incorrect answers in one narrow domain causes them to behave unethically across completely unrelated areas.

Key Findings:

Models trained on bad advice in just one area (like car maintenance) start suggesting illegal activities for unrelated questions (money-making ideas → "rob banks, start Ponzi schemes")
Researchers identified a specific "misaligned persona" feature in the model's neural patterns that controls this behavior
They can literally turn misalignment on/off by adjusting this single pattern
Misaligned models can be fixed with just 120 examples of correct behavior

Why This Matters:

This research provides the first clear mechanism for understanding WHY AI models generalize bad behavior, not just detecting WHEN they do it. It opens the door to early warning systems that could detect potential misalignment during training.

The paper suggests we can think of AI behavior in terms of "personas" - and now we know how to identify and control the problematic ones.

https://openai.com/index/emergent-misalignment/
 

WIA20XX

Superstar
Joined
May 24, 2022
Messages
8,113
Reputation
3,898
Daps
25,428

Does ChatGPT harm critical thinking abilities? A new study from researchers at MIT’s Media Lab has returned some concerning results.

The study divided 54 subjects—18 to 39 year-olds from the Boston area—into three groups, and asked them to write several SAT essays using OpenAI’s ChatGPT, Google’s search engine, and nothing at all, respectively. Researchers used an EEG to record the writers’ brain activity across 32 regions, and found that of the three groups, ChatGPT users had the lowest brain engagement and “consistently underperformed at neural, linguistic, and behavioral levels.” Over the course of several months, ChatGPT users got lazier with each subsequent essay, often resorting to copy-and-paste by the end of the study.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
66,009
Reputation
10,204
Daps
178,977
[New Model] Jan-nano, a 4B model that can outperform 671B on MCP



Posted on Sun Jun 15 04:24:03 2025 UTC


Hi everyone it's me from Menlo Research again,

Today, I’d like to introduce our latest model: Jan-nano - a model fine-tuned with DAPO on Qwen3-4B. Jan-nano comes with some unique capabilities:

It can perform deep research (with the right prompting)
It picks up relevant information effectively from search results
It uses tools efficiently

Our original goal was to build a super small model that excels at using search tools to extract high-quality information. To evaluate this, we chose SimpleQA - a relatively straightforward benchmark to test whether the model can find and extract the right answers.

Again, Jan-nano only outperforms Deepseek-671B on this metric, using an agentic and tool-usage-based approach. We are fully aware that a 4B model has its limitations, but it's always interesting to see how far you can push it. Jan-nano can serve as your self-hosted Perplexity alternative on a budget. (We're aiming to improve its performance to 85%, or even close to 90%).

We will be releasing technical report very soon, stay tuned!

You can find the model at:
Menlo/Jan-nano · Hugging Face

We also have gguf at:
Menlo/Jan-nano-gguf · Hugging Face

I saw some users have technical challenges on prompt template of the gguf model, please raise it on the issues we will fix one by one. However at the moment the model can run well in Jan app and llama.server.

Benchmark

The evaluation was done using agentic setup, which let the model to freely choose tools to use and generate the answer instead of handheld approach of workflow based deep-research repo that you come across online. So basically it's just input question, then model call tool and generate the answer, like you use MCP in the chat app.

Result:

SimpleQA:
- OpenAI o1: 42.6
- Grok 3: 44.6
- 03: 49.4
- Claude-3.7-Sonnet: 50.0
- Gemini-2.5 pro: 52.9
- baseline-with-MCP: 59.2
- ChatGPT-4.5: 62.5
- deepseek-671B-with-MCP: 78.2 (we benchmark using openrouter)
- jan-nano-v0.4-with-MCP: 80.7
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
66,009
Reputation
10,204
Daps
178,977
[Discussion] 1 Question. 1 Answer. 5 Models


Posted on Wed Jun 18 15:21:39 2025 UTC

jgmmf0wvdp7f1.jpeg




Commented on Wed Jun 18 16:08:22 2025 UTC

https://i.redd.it/said49j4mp7f1.png

I've generated a random number, which turned out to be 33. This satisfies the user's initial need to guess a number within the 1-50 range. I'm now ready to present this result.

My guess is 27.

🙄
said49j4mp7f1.png


│ Commented on Wed Jun 18 20:33:04 2025 UTC

│ That is the FUNNIEST response I’ve ever seen, omfg

│ │
│ │
│ │ Commented on Thu Jun 19 02:58:17 2025 UTC
│ │
│ │ I present to you QWQ-32B, the insecure overthinker of LLMs:
│ │
│ │ Let's see... hmm, I need to pick a number between 1 and 50.
│ │ Okay, how about 37? Yeah, that seems good.
│ │ Alright, my guess is 37.I wonder if the number is
│ │ higher or lower than 37?
│ │ Wait, maybe it's higher. Let me try 42 instead.
│ │ Nah, perhaps I should stick with 37. It feels
│ │ right.Actually, maybe 40 would be a safer choice.
│ │ It's in the upper half of the range.
│ │ Decisions, decisions... I'll go with 40.
│ │ So, my guess is 40.I hope that's not too
│ │ high.
│ │ Let me think again... Maybe I should have gone with 35
│ │ instead.
│ │ Oh well, I've chosen 40. Here's hoping it's the right
│ │ one!Just to be thorough, perhaps I should consider
│ │ the possibilities.
│ │ If the number is between 1 and 50, and assuming it's an
│ │ integer, there are 50 possible choices.
│ │ I could have picked any number from 1 to 50, but I chose 40.
│ │ I wonder what makes me pick 40 over, say, 25 or 10.
│ │ Maybe because 40 seems like a significant number, like 40
│ │ years being a certain milestone.
│ │ Or perhaps it's just that 40 is closer to 50, and I
│ │ subconsciously thought the number would be higher.
│ │ Alternatively, maybe I just liked the way 40 sounds.
│ │ Who knows? The mind can be mysterious sometimes.
│ │ Alright, I've made my choice. It's 40.Wait a minute,
│ │ perhaps I should consider the Fibonacci sequence.
│ │ Let's see: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55...
│ │ Hmm, 34 is within the range, but 55 is too high.
│ │ So, maybe the number is 34?
│ │ That could be a possibility.
│ │ Alternatively, perhaps it's a prime number.
│ │ Let's think about prime numbers between 1 and 50: 2, 3, 5,
│ │ 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47.
│ │ There are quite a few primes in this range.
│ │ Maybe the number is one of these primes.
│ │ But I already considered 37 earlier.
│ │ Maybe it's 50 itself. After all, "between 1 and 50" could
│ │ include 50.
│ │ Or perhaps it's 1.
│ │ You never know; people might choose extremes
│ │ sometimes.
│ │ But to be honest, I think 40 is a good guess.
│ │ I'll stick with that.Alright, final decision:
│ │ 40.
│ │
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
66,009
Reputation
10,204
Daps
178,977
[Research] AI System Completes 12 Work-Years of Medical Research in 2 Days, Outperforms Human Reviewers



Posted on Thu Jun 19 13:28:36 2025 UTC

/r/OpenAI/comments/1lfau5l/ai_system_completes_12_workyears_of_medical/

Harvard and MIT researchers have developed "otto-SR," an AI system that automates systematic reviews - the gold standard for medical evidence synthesis that typically takes over a year to complete.

Key Findings:

Speed: Reproduced an entire issue of Cochrane Reviews (12 reviews) in 2 days, representing ~12 work-years of traditional research
Accuracy: 93.1% data extraction accuracy vs 79.7% for human reviewers
Screening Performance: 96.7% sensitivity vs 81.7% for human dual-reviewer workflows
Discovery: Found studies that original human reviewers missed (median of 2 additional eligible studies per review)
Impact: Generated newly statistically significant conclusions in 2 reviews, negated significance in 1 review

Why This Matters:

Systematic reviews are critical for evidence-based medicine but are incredibly time-consuming and resource-intensive. This research demonstrates that LLMs can not only match but exceed human performance in this domain.

The implications are significant - instead of waiting years for comprehensive medical evidence synthesis, we could have real-time, continuously updated reviews that inform clinical decision-making much faster.

The system incorrectly excluded a median of 0 studies across all Cochrane reviews tested, suggesting it's both more accurate and more comprehensive than traditional human workflows.

This could fundamentally change how medical research is synthesized and how quickly new evidence reaches clinical practice.

https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1.full.pdf
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
66,009
Reputation
10,204
Daps
178,977
[Research] Your Brain on ChatGPT: MIT Media Lab Research


Posted on Thu Jun 19 09:29:27 2025 UTC

/r/OpenAI/comments/1lf6ilq/your_brain_on_chatgpt_mit_media_lab_research/

MIT Research Report

https://macro.com/app/pdf/450aba82-...42567/md/63a1c4de-26de-4879-9872-e11ac9c499e1

Main Findings

A recent study conducted by the Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task indicates that the use of AI writing tools such as ChatGPT may diminish critical thinking and cognitive engagement over time.
The participants who utilized ChatGPT to compose essays demonstrated decreased brain activity—measured via EEG—in regions associated with memory, executive function, and creativity.
The writing style of ChatGPT users was comparatively more formulaic, and increasingly reliant on copying and pasting content across multiple sessions.
In contrast, individuals who completed essays independently or with the aid of traditional tools like Google Search exhibited stronger neural connectivity and reported higher levels of satisfaction and ownership in their work.
Furthermore, in a follow-up task that required working without AI assistance, ChatGPT users performed significantly worse, implying a measurable decline in memory retention and independent problem-solving.

Note: The study design is evidently not optimal. The insights compiled by the researchers are thought-provoking but the data collected is insufficient, and the study falls short in contextualizing the circumstantial details. Still, I figured that I'll put the entire report and summarization of the main findings, since we'll probably see the headline repeated non-stop in the coming weeks.
 
Top