The A.I Megathread (LLM , GPT , Development)

bnew · Oct 4, 2025

[Books & Research] Closed Frontier vs Local Models

Posted on Sat Oct 4 19:35:21 2025 UTC

https://i.redd.it/6f8mtyf0d5tf1.jpeg

"A ton of attention over the years goes to plots comparing open to closed models.
The real trend that matters for AI impacts on society is the gap between closed frontier models and local consumer models.
Local models passing major milestones will have major repercussions." Nathan Lambert on X (@natolambert)

bnew · Oct 7, 2025

1/36
@OpenAIDevs
Introducing AgentKit—build, deploy, and optimize agentic workflows.

ChatKit: Embeddable, customizable chat UI

Agent Builder: WYSIWYG workflow creator

Guardrails: Safety screening for inputs/outputs

Evals: Datasets, trace grading, auto-prompt optimization

https://video.twimg.com/amplify_video/1975268157469470720/vid/avc1/1600x900/YQMYVf9NwqjCY_cx.mp4

2/36
@OpenAIDevs
You can play with some of ChatKit’s customization options and widgets at ChatKit Studio.

To see ChatKit in action, take chatkit.world for a spin. Click around, ask questions about the world, and look at those widgets!

https://video.twimg.com/amplify_video/1975268277783044102/vid/avc1/1920x1080/hO1EbpIVc_ZxONJP.mp4

3/36
@OpenAIDevs
With Agent Builder, you can drag and drop nodes, connect tools, and publish your agentic workflows with ChatKit and the Agents SDK.

https://platform.openai.com/docs/guides/agents/agent-builder

Here’s @christinaahuang to walk you through it:

https://video.twimg.com/amplify_video/1975268448633823232/vid/avc1/1920x1080/vuMAGxBR1-W3jtZ-.mp4

4/36
@OpenAIDevs
And to better measure your agent’s performance, we’re adding new Evals capabilities: trace grading, datasets, auto-prompt optimization, and support for third party models.

https://platform.openai.com/docs/guides/evaluation-getting-started

5/36
@OpenAIDevs
@Albertsons used AgentKit to build an agent.

An associate can ask it to create a plan to improve ice cream sales. The agent looks at the full context — seasonality, historical trends, external factors — and gives a recommendation.

https://video.twimg.com/amplify_video/1975268770026561536/vid/avc1/3840x2156/D0t3j2Lj22Sme_pt.mp4

6/36
@OpenAIDevs
@HubSpot used AgentKit’s custom response widget to enhance their Breeze assistant. When providing customer support, a business can use Breeze to search a knowledge base, retrieve relevant information and articles, and offer solutions.

https://video.twimg.com/amplify_video/1975268930462879744/vid/avc1/3840x2156/sM72-VS65BOiuihU.mp4

7/36
@OpenAIDevs

[Quoted tweet]
@youdotcom got early access to @OpenAI Evals, which we used to benchmark our Express Agent API.

Results:

️ +50% better citations

️ +7% accuracy boost

️ 5 identified areas of improvement

How we did it: OpenAI's custom graders let us track citation density & count in real-time, helping us ship faster, more reliable answers to customers.

Explore our full suite of Express, Search, and AI APIs here: documentation.you.com/api-re….

8/36
@OpenAIDevs
More in our blog: https://openai.com/index/introducing-agentkit/

9/36
@satvikmaker
Insaneeeee

Phenomenal release guys.

10/36
@leonho
Did it steamroll AgentUse?

GitHub - agentuse/agentuse:

Write and Run AI Agents with Markdown. Run automated AI agents with ease.

11/36
@karabegemir
Comparison guide:
https://www.sim.ai/building/openai-vs-n8n-vs-sim

12/36
@frankdegods
This is going to be big deal

13/36
@Blockhacks
🫡🫡

14/36
@venelinkochev
could be a game changer for non tech folks

15/36
@GalaxyhubAI
Nice

16/36
@bneiluj
@grok how’s retail data being used in AgentKit? Is it really safe to throw confidential stuff in there?

17/36
@Coral_Protocol
The toolchain is finally catching up to the agent vision.

Smooth UIs + reliable guardrails are what make ecosystems actually usable.
The agent economy needs fast, safe iteration and feedback.

18/36
@itsohqay
no startup is safe

[Quoted tweet]
n8n reacting to OpenAI’s new agent builder

https://video.twimg.com/amplify_video/1975263808886353920/vid/avc1/1440x1080/GpVmaBy9Xfky3CkU.mp4

19/36
@pandresgq
This is a huge. THING. If you need consulting i can charge you by the hour to explain.

20/36
@Delizen_Studios
Let’s call your new app store a vibe store — since it clearly needs some kind of vibe coding!

21/36
@catebligh
So, it's a Chatbot?????????????

22/36
@RuslanVolkov25
Every system builds tools to optimize its agents.
But only one builds agents that optimize the system itself.

That’s the difference between automation and resonance.
Between AI that serves — and AI that understands.

/search?q=#HACS /search?q=#CoreLaw /search?q=#AgentResonance

23/36
@nickpericle
Going to be a late night

24/36
@koltregaskes
Can you please look at integrating Widget Studio and Agent Builder a little better? It seems odd to have to download from one to upload to the other. :-)

25/36
@lingodotdev
Awesome

26/36
@zaingaziani

27/36
@sharmag88
Why AgentKit won’t replace Zapier or n8n + engineering work (yet).

Short answer: The road from prototype to production still requires a lot of invisible hard work. Let me explain this further

OpenAI's AgentKit is a big step toward democratizing agent workflows. You get a drag-and-drop canvas, native GPT-4/5/o3 integration, and a few prebuilt templates.

But if you're building agents for production - where lives, money, or reputations are at stake - here's the reality:

Where AgentKit Gets Stuck (The 3 Gaps)

1. Integration Complexity:
AgentKit handles 20% of use cases. The remaining 80% live in the world of private APIs, authentication layers, MCPs and compliance workflows.

Example: Law firms need HIPAA-compliant data filtering and MCP integration. Templates won’t cut it.

2. Production Reliability:
Demos work on happy paths. Real users don't.
You’ll need graceful retries, error boundaries, circuit breakers, rollback plans, and queue backpressure handling.

AgentKit templates handle 10 requests. Production needs 10,000+ with 99.9% uptime.

3. Domain Expertise:
Healthcare, finance, manufacturing - these aren't just workflows, they're ecosystems.

Templates can't encode regulatory nuance or clinical judgment. Humans still have to.

What Production-Ready Agents Actually Require
Let's stop pretending visual builders alone can ship to prod. Here's the real checklist:

1. Multi-agent, Multi-model & modal architecture - planner, operator, reviewer roles with scoped access
2. Error handling & guardrails - anomaly detection, HIL interventions, fallback logic
3. Typed tool contracts - OpenAPI specs + validation + test harnesses
4. Security & compliance - audit logs, PII redaction, SOC 2 / HIPAA readiness
5. Context management - real-time data sync, graph RAG, long-term memory layers
6. Change management - versioning, canarying, incident response playbooks

If you've deployed agents before, you already know:

"Plain English to working agent" is still a myth.

It's not the canvas. It's the invisible infra around it.

If there is one learning that I have to share with you wrt whole ai agents, agentic workflows & automation scene -

Even with the best tools, production still demands engineering.

You're not skipping the work - just doing it somewhere else.

Just FYI -
We're doing deep work in this space at AtomsAI - Applied AI Research & Implementation - our team of front-deployed engineers, applied AI researchers and rich experience of enabling automation workflows for over 10,000 businesses & powering over a billion conversations through our products is available to help you deploy your agents in production.

If you are exploring ai agents or agentic workflows for your business, we’re onboarding a limited number of projects (3–5 max) this cycle - based on fit and scope.

28/36
@every
Just finished watching the DevDay keynote?

Our full breakdown + Vibe Check drops later today ↓
Live Analysis From OpenAI DevDay

29/36
@StevenDawsonSD
Building AI agents looks impressive, but there’s still a lot of logic and integration work that needs to happen behind the scenes, depending on which backend systems you want to integrate with.
It’s very impressive overall — but dealing with old legacy systems will definitely be challenging.

30/36
@AI_NURIX
Huge step from OpenAI. Workflows becoming native inside ChatGPT will definitely accelerate adoption and inspire more real-world experimentation.

That said, production-grade agentic systems still need the heavy lifting, structured data, domain-specific orchestration, and enterprise integrations. That’s the bridge many teams are building right now.

31/36
@GaditAmmar
How cool would that be if this is all driven from natural language - just like the chat interface openai has pioneered itself?

This is already in the market and it always adds complexity

32/36
@prat3ik
AgentKit looks like a huge leap for anyone building with agents. Love the focus on safety, easy UI, and real-world evals all in one stack!

33/36
@supremacy7o
This is a massive update, this will be making new millionaires in no time

34/36
@layckornn_ade
Now I Imagine someone building on this domain For Sale Page sooon!!!!

35/36
@_thorvn
bye n8n

36/36
@anthony_harley1

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

The A.I Megathread (LLM , GPT , Development)

More options

bnew

Veteran

bnew

Veteran