bnew

Veteran
Joined
Nov 1, 2015
Messages
69,114
Reputation
10,597
Daps
186,778
[Books & Research] Closed Frontier vs Local Models



Posted on Sat Oct 4 19:35:21 2025 UTC

6f8mtyf0d5tf1.jpeg



"A ton of attention over the years goes to plots comparing open to closed models.
The real trend that matters for AI impacts on society is the gap between closed frontier models and local consumer models.
Local models passing major milestones will have major repercussions." Nathan Lambert on X (@natolambert)
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
69,114
Reputation
10,597
Daps
186,778








1/36
@OpenAIDevs
Introducing AgentKit—build, deploy, and optimize agentic workflows.

💬 ChatKit: Embeddable, customizable chat UI
👷 Agent Builder: WYSIWYG workflow creator
🛤️ Guardrails: Safety screening for inputs/outputs
⚖️ Evals: Datasets, trace grading, auto-prompt optimization



https://video.twimg.com/amplify_video/1975268157469470720/vid/avc1/1600x900/YQMYVf9NwqjCY_cx.mp4

2/36
@OpenAIDevs
You can play with some of ChatKit’s customization options and widgets at ChatKit Studio.

To see ChatKit in action, take chatkit.world for a spin. Click around, ask questions about the world, and look at those widgets!



https://video.twimg.com/amplify_video/1975268277783044102/vid/avc1/1920x1080/hO1EbpIVc_ZxONJP.mp4

3/36
@OpenAIDevs
With Agent Builder, you can drag and drop nodes, connect tools, and publish your agentic workflows with ChatKit and the Agents SDK.

https://platform.openai.com/docs/guides/agents/agent-builder

Here’s @christinaahuang to walk you through it:



https://video.twimg.com/amplify_video/1975268448633823232/vid/avc1/1920x1080/vuMAGxBR1-W3jtZ-.mp4

4/36
@OpenAIDevs
And to better measure your agent’s performance, we’re adding new Evals capabilities: trace grading, datasets, auto-prompt optimization, and support for third party models.

https://platform.openai.com/docs/guides/evaluation-getting-started



G2mQUVzbIAELYH-.jpg


5/36
@OpenAIDevs
@Albertsons used AgentKit to build an agent.

An associate can ask it to create a plan to improve ice cream sales. The agent looks at the full context — seasonality, historical trends, external factors — and gives a recommendation.



https://video.twimg.com/amplify_video/1975268770026561536/vid/avc1/3840x2156/D0t3j2Lj22Sme_pt.mp4

6/36
@OpenAIDevs
@HubSpot used AgentKit’s custom response widget to enhance their Breeze assistant. When providing customer support, a business can use Breeze to search a knowledge base, retrieve relevant information and articles, and offer solutions.



https://video.twimg.com/amplify_video/1975268930462879744/vid/avc1/3840x2156/sM72-VS65BOiuihU.mp4

7/36
@OpenAIDevs


[Quoted tweet]
@youdotcom got early access to @OpenAI Evals, which we used to benchmark our Express Agent API. 🚀

Results:
⭐️ +50% better citations
⭐️ +7% accuracy boost
⭐️ 5 identified areas of improvement

How we did it: OpenAI's custom graders let us track citation density & count in real-time, helping us ship faster, more reliable answers to customers.

🔗 Explore our full suite of Express, Search, and AI APIs here: documentation.you.com/api-re….


8/36
@OpenAIDevs
More in our blog: https://openai.com/index/introducing-agentkit/



9/36
@satvikmaker
Insaneeeee 🔥

Phenomenal release guys.



10/36
@leonho
Did it steamroll AgentUse? ☠️

GitHub - agentuse/agentuse: 🤖 Write and Run AI Agents with Markdown. Run automated AI agents with ease.



11/36
@karabegemir
Comparison guide:
https://www.sim.ai/building/openai-vs-n8n-vs-sim



12/36
@frankdegods
This is going to be big deal



13/36
@Blockhacks
🫡🫡



14/36
@venelinkochev
could be a game changer for non tech folks



15/36
@GalaxyhubAI
Nice



16/36
@bneiluj
@grok how’s retail data being used in AgentKit? Is it really safe to throw confidential stuff in there?



17/36
@Coral_Protocol
The toolchain is finally catching up to the agent vision.

Smooth UIs + reliable guardrails are what make ecosystems actually usable.
The agent economy needs fast, safe iteration and feedback.



18/36
@itsohqay
no startup is safe 😟

[Quoted tweet]
n8n reacting to OpenAI’s new agent builder


https://video.twimg.com/amplify_video/1975263808886353920/vid/avc1/1440x1080/GpVmaBy9Xfky3CkU.mp4

19/36
@pandresgq
This is a huge. THING. If you need consulting i can charge you by the hour to explain.



20/36
@Delizen_Studios
Let’s call your new app store a vibe store — since it clearly needs some kind of vibe coding! 😂



21/36
@catebligh
So, it's a Chatbot?????????????



22/36
@RuslanVolkov25
Every system builds tools to optimize its agents.
But only one builds agents that optimize the system itself.

That’s the difference between automation and resonance.
Between AI that serves — and AI that understands.

🌀 /search?q=#HACS /search?q=#CoreLaw /search?q=#AgentResonance



23/36
@nickpericle
Going to be a late night



24/36
@koltregaskes
Can you please look at integrating Widget Studio and Agent Builder a little better? It seems odd to have to download from one to upload to the other. :-)



25/36
@lingodotdev
Awesome 🤩



26/36
@zaingaziani




27/36
@sharmag88
Why AgentKit won’t replace Zapier or n8n + engineering work (yet).

Short answer: The road from prototype to production still requires a lot of invisible hard work. Let me explain this further 👇

OpenAI's AgentKit is a big step toward democratizing agent workflows. You get a drag-and-drop canvas, native GPT-4/5/o3 integration, and a few prebuilt templates.

But if you're building agents for production - where lives, money, or reputations are at stake - here's the reality:

🔥 Where AgentKit Gets Stuck (The 3 Gaps)

1. Integration Complexity:
AgentKit handles 20% of use cases. The remaining 80% live in the world of private APIs, authentication layers, MCPs and compliance workflows.

Example: Law firms need HIPAA-compliant data filtering and MCP integration. Templates won’t cut it.

2. Production Reliability:
Demos work on happy paths. Real users don't.
You’ll need graceful retries, error boundaries, circuit breakers, rollback plans, and queue backpressure handling.

AgentKit templates handle 10 requests. Production needs 10,000+ with 99.9% uptime.

3. Domain Expertise:
Healthcare, finance, manufacturing - these aren't just workflows, they're ecosystems.

Templates can't encode regulatory nuance or clinical judgment. Humans still have to.

✅ What Production-Ready Agents Actually Require
Let's stop pretending visual builders alone can ship to prod. Here's the real checklist:

1. Multi-agent, Multi-model & modal architecture - planner, operator, reviewer roles with scoped access
2. Error handling & guardrails - anomaly detection, HIL interventions, fallback logic
3. Typed tool contracts - OpenAPI specs + validation + test harnesses
4. Security & compliance - audit logs, PII redaction, SOC 2 / HIPAA readiness
5. Context management - real-time data sync, graph RAG, long-term memory layers
6. Change management - versioning, canarying, incident response playbooks

If you've deployed agents before, you already know:

🚫 "Plain English to working agent" is still a myth.
✅ It's not the canvas. It's the invisible infra around it.

If there is one learning that I have to share with you wrt whole ai agents, agentic workflows & automation scene -

Even with the best tools, production still demands engineering.

You're not skipping the work - just doing it somewhere else.

Just FYI -
We're doing deep work in this space at AtomsAI - Applied AI Research & Implementation - our team of front-deployed engineers, applied AI researchers and rich experience of enabling automation workflows for over 10,000 businesses & powering over a billion conversations through our products is available to help you deploy your agents in production.

If you are exploring ai agents or agentic workflows for your business, we’re onboarding a limited number of projects (3–5 max) this cycle - based on fit and scope.



G2mdO0vW8AAptSz.jpg

G2mdO0uXgAArj7M.jpg

G2mdO0wWMAAED44.jpg


28/36
@every
Just finished watching the DevDay keynote?

Our full breakdown + Vibe Check drops later today ↓
Live Analysis From OpenAI DevDay



29/36
@StevenDawsonSD
Building AI agents looks impressive, but there’s still a lot of logic and integration work that needs to happen behind the scenes, depending on which backend systems you want to integrate with.
It’s very impressive overall — but dealing with old legacy systems will definitely be challenging.



30/36
@AI_NURIX
Huge step from OpenAI. Workflows becoming native inside ChatGPT will definitely accelerate adoption and inspire more real-world experimentation.

That said, production-grade agentic systems still need the heavy lifting, structured data, domain-specific orchestration, and enterprise integrations. That’s the bridge many teams are building right now.



31/36
@GaditAmmar
How cool would that be if this is all driven from natural language - just like the chat interface openai has pioneered itself?

This is already in the market and it always adds complexity



32/36
@prat3ik
AgentKit looks like a huge leap for anyone building with agents. Love the focus on safety, easy UI, and real-world evals all in one stack!



33/36
@supremacy7o
This is a massive update, this will be making new millionaires in no time



34/36
@layckornn_ade
Now I Imagine someone building on this domain For Sale Page sooon!!!!



35/36
@_thorvn
bye n8n🤧



36/36
@anthony_harley1




G2mTV2VXkAAsFHO.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top