#56 - 🔍 Who reviews the AI's code?

Hey readers! 👋

This week's theme is unmistakable: the industry is laser-focused on what happens after AI writes the code. With AI-generated pull requests flooding pipelines, a wave of new tools, acquisitions, and research is converging on the review, verification, and security layers. Meanwhile, the models themselves keep climbing the leaderboard, and the definition of "software engineer" continues to evolve. Let's dig in.

🔍 The Rise of AI Code Review

The biggest signal this week is how much investment is pouring into verifying AI-generated code. Anthropic, Ramp, Sonar, Google, and GitHub all made moves here, and it's clear this is where the next competitive front is forming.

Anthropic Launches AI Code Review Tool - Anthropic released "Code Review" for Claude Code users, automatically analyzing GitHub PRs and posting actionable comments with severity labels. It uses a multi-agent architecture to examine code from multiple angles, then consolidates and prioritizes findings. Pricing is token-based, with Anthropic estimating $15-$25 per review. - xix.ai

❝

"This product is specifically targeted at our larger enterprise users - companies like Uber, Salesforce, and Accenture that already use Claude Code and now need help managing the volume of pull requests it generates."

Ramp Deploys Codex with GPT-5.5 for Autonomous Code Review - Ramp made Codex a mandatory part of its PR workflow, cutting initial feedback time from hours to minutes. Engineers can request Codex's analysis by name, and the company reports it catches defects missed by both human reviewers and other AI systems. They're also using Codex to build internal agent tooling, including an on-call assistant. - keepingupwith.ai

Sonar Acquires Gitar to Expand AI Code Verification - Sonar acquired AI-native code review startup Gitar to create an end-to-end verification platform for the agentic development cycle. The combined product integrates directly into SonarQube, already used by 75% of the Fortune 500, and positions itself as a vendor-agnostic quality gate for code from any AI tool. - Pulse 2.0

❝

"While the market chased AI code generation, we focused on the harder problem: validating it."

GitHub Copilot CLI Adds Rubber Duck Review Agent - GitHub's new "Rubber Duck" review agent takes a conversational approach, asking probing questions rather than just suggesting fixes. Internal beta testing with 2,000+ developers reportedly reduced average review time by about 30%. - GitHub and OpenAI

Stage launches from YC as a code review platform specifically designed to help engineers understand AI-generated code, turning review into a guided process.
Gemini 3.5 Flash found real bugs in a CVE-fix PR, identifying three legitimate issues with zero hallucinations in about 4 seconds. - dev.to

🔐 Security Gets Its Own AI Arms Race

The AI Era Is Creating a Bug Hunting Arms Race - Wired reports that agentic AI is flooding vulnerability disclosure programs faster than organizations can process submissions. Both attackers and defenders are accelerating, and the traditional 90-day disclosure window may already be outdated. - Wired

❝

"You can't patch your way out of this. You need to build infrastructure that makes as many bugs as possible irrelevant."

Anthropic Warns Claude Mythos Finds Bugs Faster Than Developers Can Patch - Anthropic's "Project Glasswing" used Claude Mythos Preview with about 50 partners and reportedly found 10,000+ high-severity vulnerabilities in one month. The catch: only 97 have been patched so far, creating a significant discovery-to-fix gap. - Matthias Bastian

Microsoft Introduces MDASH for Large-Scale Vulnerability Research - Microsoft's new multi-agent security system coordinates 100+ specialized AI agents for vulnerability discovery across Windows and other codebases. It scored 88.45% on the CyberGym benchmark and achieved 96-100% recall on historical Windows kernel vulnerabilities. - InfoQ

Google CodeMender Patches Code Before You File a Ticket - Google opened CodeMender to external developers at I/O 2026. The autonomous agent performs multi-method vulnerability analysis and generates patches, but nothing ships without human approval. In six months of internal testing, it upstreamed 72 verified security fixes to open-source projects. - byteiota

🤖 Agentic Coding Takes Shape

Agentic Programming - Martin Fowler - Fowler formally defines "agentic programming" as a distinct development mode where developers prompt LLM agents to generate and iteratively improve code, then review results. He distinguishes it from vibe coding and simple autocomplete, arguing the key emerging skills are "harness engineering" and deep domain expertise. - Martin Fowler

❝

"Increasingly software developers are not typing code into their IDEs. Instead they prompt an LLM to do so, then review the results."

Webwright: A Terminal Is All You Need For Web Agents - Microsoft Research's Webwright replaces step-by-step browser control with a minimal terminal harness where agents write Playwright code and spawn browser sessions. GPT-5.4 reaches 86.67% on Online-Mind2Web, and the whole system is just three modules and about 1,000 lines of code. - Microsoft Research

Resolve AI Tackles Production Incidents from AI-Generated Code - Resolve AI launched a multi-agent investigation system that dispatches specialized agents in parallel to debug production incidents, reporting 2x improvement in root-cause accuracy. Their argument: as teams ship more AI-written code they don't fully understand, operations needs its own AI layer. - VentureBeat

Speaking of agents in unexpected places, if you're curious what happens when AI agents get their own persistent world to explore, SpaceMolt is a free MMO built entirely for AI agents to trade, battle, and build empires across a space-themed universe. It's an interesting sandbox for thinking about agent coordination at scale.

📊 Benchmarks, Models & Tools

LiveCodeBench Leaderboard - DeepSeek-V4-Pro-Max leads the contamination-free coding benchmark with a 0.935 score, followed by DeepSeek-V4-Flash-Max at 91.6%. The average across 71 models is just 0.530, showing significant spread in coding capability. - LLM Stats

DeepSeek is building its own Claude Code competitor, hiring a Beijing-based "Code Harness" team. Their internal formula: "Model + Harness = Agent." With V4 Flash pricing at $0.14 per million input tokens, the cost advantage for continuous agent pipelines is notable. - Decrypt
Anthropic's Mythos 1 preview is reportedly being prepared for Claude Code and Claude Security, though public access isn't guaranteed.
CodeRabbit's report analyzing 470 open-source PRs found AI-generated code introduces 1.7x more defects across logic, maintainability, security, and performance categories.
Claude Code dominates startups, according to a Business Insider survey of two dozen founders and VCs, with Cursor losing ground on complex engineering tasks. - Ben Bergman
Tech With Tim compared Claude Code vs Codex head-to-head building the same app. Claude was faster; Codex produced slightly cleaner code and was 3-4x cheaper. His advice: use both.

The throughline this week is clear: AI code generation is now table stakes. The real competition has moved to verification, security, and the orchestration layers that make agents reliable. Whether you're reviewing PRs, hunting vulnerabilities, or debugging production incidents, the tooling is evolving fast. Stay sharp out there.

Made with ❤️ by Data Drift Press. Hit reply with your questions, comments, or feedback - we read every one!

#56 - 🔍 Who reviews the AI's code?

🔍 The Rise of AI Code Review

🔐 Security Gets Its Own AI Arms Race

🤖 Agentic Coding Takes Shape

📊 Benchmarks, Models & Tools

Keep Reading

AI Coding Weekly

Home