It's Friday, June 12th: Welcome to The Stress Test 🔍
In April, Anthropic held back a Mythos-class model, calling it too risky to ship widely. On Tuesday, it put a version in everyone's hands. This week's Stress Test asks whether Fable 5 is actually safe, or just safely wrapped.
🔍THE STRESS TEST
Anthropic Held Mythos Back in April. Tuesday It Shipped a Version to Everyone.
One safety story a week, pressure-tested for what's actually happening underneath

Image from Anthropic
In April, Anthropic held back the broad release of its Mythos Preview model, calling it too risky to ship widely. On Tuesday, June 9th, it put a version of that same model class in everyone's hands.
We covered the launch on Wednesday: Claude Fable 5, a Mythos-class model and the most capable system Anthropic has put in public hands, free on paid plans through June 22. In early testing, Stripe used it to run a migration across 50 million lines of Ruby in a single day, a job it estimated would have taken a team more than two months by hand. The safety story is a guardrail that routes high-risk cyber, biology, chemistry, and distillation questions to the older Claude Opus 4.8, and we called that fallback the real move, capability and restraint shipping in the same release. That is the part worth slowing down on.
A fallback is a routing layer. It reads the request, and when it looks dangerous, it hands off. Anthropic says that happens in under 5% of sessions, but that number measures how often the guardrail fires, not how often it should have. The capability that made Mythos too risky to release in April, finding and exploiting software vulnerabilities, is still inside Fable 5. The wrapper just decides when to cover for it. And the public backstop is just as soft: the executive order signed June 2 only asks labs to submit their most powerful models for government testing voluntarily. So the restraint runs guardrail over guardrail, a routable filter on the model and an optional checkpoint on the policy, and "safe for general use" becomes a claim about all of it holding against everyone who pokes at it, every day. A guardrail you can phrase your way around is a different promise than a model that cannot do the thing in the first place.
The question underneath the benchmark sweep is whether the restraint we credited on Wednesday lives in the model or in the layer wrapped around it. The real test is the first jailbreak that gets a vulnerability-exploit past the filter, and with free access open through June 22, plenty of people are already looking for it.
VOICES FROM OUR COMMUNITY
✍️ Boris Cherny, the Head of Claude Code, Stopped Prompting Claude. He Designs the Loops That Do It for Him.
If you still type instructions to your agent one at a time, Addy Osmani thinks you're already behind. The catch: the new job can quietly hollow you out.

Image from X
For about two years, getting work out of a coding agent meant writing a good prompt and feeding it context. You typed, you read what came back, you typed the next thing. You held the tool the entire time. This week Addy Osmani, who works on Chrome at Google, argued that part is ending, and pointed at two people already living past it.
"You shouldn't be prompting coding agents anymore," Peter Steinberger told him. "You should be designing loops that prompt your agents." Boris Cherny, who runs Claude Code at Anthropic, put it flatter: "I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops."
Osmani's useful move is to make the abstraction concrete. A loop, he says, is five pieces plus a place to remember things. Automations that fire on a schedule and surface the work. Worktrees so two agents editing the same repo don't collide. Skills so the agent reads your project conventions instead of guessing them. Connectors so it can open the pull request and update the ticket, not just describe the fix. Sub-agents so the one who writes the code isn't the one grading it. And a memory on disk, a markdown file or a Linear board, because the model forgets everything between runs and the repo doesn't. His point that lands hardest: both Claude Code and Codex now ship all five, so the argument stops being about which tool and starts being about how you wire it.
Here is the line we'd underline for this audience. Osmani is skeptical of his own thesis, and that skepticism is the whole value. Verification is still yours. Comprehension debt grows faster when the loop ships code you never read. And the most comfortable posture, taking whatever the loop hands back without an opinion, is the dangerous one. Two people build the identical loop and get opposite results: one moves faster on work they understand deeply, the other stops understanding the work at all. The loop can't tell them apart. You can.
"Build the loop. But build it like someone who intends to stay the engineer, not just the person who presses go."
None of this makes the work easier. It moves where your attention has to go, and that is a harder thing to design than a good prompt.
FROM OUR SPONSORS
✍️ Stress-Testing the Limits of Multilingual AI: The LILT Coding Hackathon

Image from Luma
While flagship LLMs dominate standard English coding benchmarks, their reasoning vulnerabilities outside of English remain largely undocumented. From June 15–21, LILT is challenging applied AI researchers to expose these hidden breaking points.
The mission: build deterministic, machine-verifiable coding tasks that reveal exactly where Claude Opus 4.6 fractures in non-English environments. All submissions will be programmatically evaluated in Terminal-Bench via the Terminus 2 harness.
Compete to claim a top-5 gift card prize and secure a featured spotlight across the AI Collective and LILT networks!
Build your boundary-testing task in Terminal-Bench today to see if you can successfully break Claude Opus 4.6.

Each week, we highlight AIC chapters doing groundbreaking work with their members around the world. Tag us on socials to be featured!
👑 CLT | Charlotte: The 5 Things AI Can Never Take From You

Image from AI Collective
On a panel hosted in Charlotte about fostering human potential, moderated by Sairohith T., the room worked toward one answer: the part of the job AI can't take. Pravi Devineni, Ph.D. showed the upside first, picturing a 125-year-old utility finally able to search paper records and punch cards from the 1970s, and pushed everyone to treat the technology as a resource and become better problem solvers. Kara Martin Schlageter gave the room its rule, "outsource the task, not the thinking," then named the five things she says AI can never take: judgment, moral courage, interpretation, accountability, and presence.
Matt Sadinsky brought the long view: you still have to learn your craft, because if the only tool you own is a hammer, every problem starts to look like a nail. The thread tying it together was authenticity, and the line the room carried home was shorter than any framework. Stay weird.
🇻🇳 VN | Vietnam: Your AI Sends a Bad Quote Tomorrow. Who Gets Blamed?

Image from AI Collective
At "Leading in the Age of Humans + AI Agents," the opening session on AI governance put a hard question on the table. Le Uyen Thao and AI Leaders Vietnam asked the one every operator deploying agents will eventually face: if an AI agent sends the wrong quote or hands the sales team a bad recommendation tomorrow, who is actually accountable? Their answer reframed the whole problem.
Governance is no longer an IT problem, it's a leadership capability, because AI doesn't really expose weak technology so much as it exposes leadership that hasn't adapted yet. The accountability doesn't move to the agent. It stays exactly where it always was.
🇮🇩 JKT | Jakarta: The Models Won't Save You. The Room Might.

Image from AI Collective
In Bekasi, Viking Karwur and the Jakarta chapter gathered builders, founders, and creators, welcomed two new members, and went deep on the practical stuff, from AI-native agent systems to building with Supabase. What stuck wasn't a tool. It was the conclusion the room landed on by the end: the future of AI isn't really about models or infrastructure, it's about communities that learn, experiment, and build together.
🫵 Want your message in front of 200,000 AI builders?
Our partners and sponsors get exclusive placements across the newsletter and access to AIC's in-person network — demo nights, dinners, hackathons, and forums across 180+ chapters.
For all inquiries, send us a note at [email protected].
The AI Collective is built by volunteers across 180+ chapters in 40 countries.
Thank you to the thousands of volunteers around the world who make this work possible. We truly could not do this without you.
🧑💻 About the Editors

About Noah Frank
Noah is a researcher, innovation strategist, and ex-founder thinking and writing about the future of AI and the workforce. His work and body of research explores the economics of emerging technology and organizational strategy. Outside of AIC, Noah heads research for Centaurian AI.

About Joy Dong
Joy is a news editor, writer, and entrepreneur at the intersection of AI and blockchain. Whether she is demystifying complex systems in her newsletter, TEA, or building streamlined solutions through her automation agency, Ownly, Joy’s mission is to make emerging tech accessible and actionable for everyone.

About Lindsay Gross
Lindsay is an AI engineer, researcher, and writer focused on how AI systems behave in practice and what it takes to make them safe. Her work sits at the intersection of AI safety, governance, and product design, and at AIC she writes about the questions that matter most as these systems scale.

