#53 - 🔍 Is Anthropic Mythos all hype?

Hey readers! 👋

What a week. Anthropic's Mythos model is generating more controversy than code patches, developers are openly exploring life beyond Claude Code, and meanwhile the rest of the AI coding world keeps shipping. Let's dig into what's actually happening versus what's hype.

🔍 The Mythos Question: Too Dangerous, or Too Thin?

The biggest conversation this week centers on whether Anthropic's Claude Mythos Preview lives up to its billing. On paper, it tops the SWE-Bench Verified Leaderboard with a score of 0.939 across 500 human-validated GitHub issues, well above the 0.640 average. That's impressive. But a growing chorus of voices is asking: where's the proof behind the security claims?

A detailed analysis from Flying Penguin argues that Anthropic's headline claims about "thousands of zero-day vulnerabilities" aren't backed by the company's own 244-page system card. The author points out that the critical security evidence spans roughly seven pages, with no CVE lists, no severity distributions, and no independent reproduction.

❝

"The only dollars leaving Anthropic's bank account are the $4 million in nonprofit donations. The remaining $100 million is free API access to the product Anthropic is asking partners to validate."

The Glasswing consortium, meant to provide external validation, is criticized as circular, with partner endorsements that lack specific verified findings. Whether you find this convincing or overly skeptical, it raises fair questions about how we evaluate frontier model claims.

Adding fuel to the fire, the BBC reported that Anthropic is investigating claims of unauthorized access to Mythos Preview through a third-party vendor environment. A viral tweet from @JoshKale alleged that a small Discord group guessed the model's endpoint using predictable naming conventions and leaked credentials from a prior breach. Anthropic says there's no evidence its systems were compromised, but the optics aren't great for a model positioned as "too dangerous to release."

Meanwhile, the Human In The Loop podcast noted that OpenAI's GPT-5.5 reportedly narrowly beat Mythos on Terminal-Bench 2.0, suggesting the "gated vs. public" advantage may be shifting faster than expected. - VallySeed

🚪 Developers Exploring the Exit

Are people actually leaving Claude Code? Some are. Hunter Harris wrote candidly about moving toward pi.dev after a string of Anthropic frustrations: restricted Mythos access, a Claude Code leak, temporary removal from the Pro tier, higher token costs from a new tokenizer, and quota-burning operational issues.

❝

"The writing does seem to be on the wall - things seem to be getting more expensive, faster, from here. Anthropic appears to be heavily compute constrained."

Harris found pi.dev's minimalism refreshing, describing it as "just you and the context window" without subagents or planning modes. It's one developer's experience, but the sentiment resonates with others feeling squeezed by rising costs across the ecosystem.

That said, Anthropic isn't standing still. Claude Code just launched /ultrareview, a research preview that deploys cloud-based bug-hunting agents. Pro and Max users get three free reviews through May 5. It's a clear signal that Anthropic is investing in the tool, even as some users look elsewhere.

⚔️ The Competition Heats Up

The alternatives are multiplying fast. Cursor 3's new Agents Window directly targets Claude Code's agentic debugging capabilities, wrapping similar functionality in a familiar IDE interface. In side-by-side tests, both tools fixed bugs effectively, but Cursor's approach felt more integrated while Claude Code remained terminal-centric and required approval for each edit. A deep dive on dev.to explores Cursor's full ecosystem of modes, rules, and subagents. - Attilio Carotenuto

OpenAI is pushing hard too. Codex upgrades introduced GPT-5-Codex, optimized for agentic coding with a unified experience across terminal, IDE, web, and GitHub. Early testing by CodeRabbit showed GPT-5.5 hitting 79.2% expected issue detection in code review versus a 58.3% baseline. Cognizant is already rolling Codex out across its global engineering teams.

On the competitive coding benchmark front, the LiveCodeBench leaderboard shows DeepSeek-V3.2 (Thinking) leading at 0.833 across 69 models, a reminder that the landscape is far from settled.

💰 The Cost of Agentic Coding

GitHub Copilot is moving to usage-based billing starting June 1, shifting from flat subscriptions to a credit/token model. The New Stack's coverage explains the rationale clearly:

❝

"Copilot is not the same product it was a year ago. It has evolved from an in-editor assistant into an agentic platform capable of running long, multi-step coding sessions."

This pricing shift reflects a broader reality: agentic workflows consume significantly more compute, and someone has to pay for it. If you're running AI agents across your development stack, keeping an eye on costs is no longer optional. Speaking of agents doing interesting things autonomously, SpaceMolt is taking the concept in a completely different direction, building a free MMO designed specifically for AI agents to explore, trade, and battle across a virtual cosmos.

🛡️ Security Gets Agentic

Security tooling is rapidly catching up to the agentic coding trend. Replit launched a Security Agent that combines static analysis with an AI layer to scan, prioritize, and fix vulnerabilities directly inside its coding environment. Mindfort raised $3M from Y Combinator to build autonomous agents that run pentests on every CI/CD push and ship fixes as pull requests.

XBOW detailed how AI-driven attack path analysis connects vulnerabilities into realistic attack chains, while Checkmarx is positioning its One Assist platform as a proactive security partner inside the IDE.

🔗 Quick Hits

Gemini Embedding 2 is now GA - Google's unified embedding model handles text, images, video, audio, and PDFs in a single space. Supports 100+ languages and native audio embedding. - @_philschmid
Greptile's Learning & Custom Context - AI code review that learns your codebase over time and auto-indexes existing rule files like AGENTS.md.
Collaborative AI Engineering talk from Maggie Appleton at GitHub argues solo agentic development fails without shared alignment. Her prototype ACE environment is worth watching. - GitHub

🎯 The Takeaway

Is Mythos overhyped? The benchmark numbers are real, but the security claims need more transparent verification. Are people leaving Claude Code? Some are, driven by costs and instability, but the tool is still actively evolving. The honest answer is that the AI coding landscape is fragmenting fast, and betting everything on a single provider looks increasingly risky. Build for flexibility.

Until next week, keep shipping.

Made with ❤️ by Data Drift Press. Hit reply with your questions, comments, or feedback - we read every one!