#37 - 🐛 AI code has 1.7x more bugs

Hey readers! 🎉 As we wrap up the year, the AI coding world is buzzing with debates about code review, quality metrics, and whether all those extra lines of code are actually helping us ship better software. Spoiler: the answer is complicated, and this week's stories dive deep into the messy reality of AI-assisted development.

📊 This Week's Highlights

The State of AI Coding Report 2025 dropped from Greptile, revealing that lines of code per developer jumped from 4,450 to 7,839 with AI tools. But here's the catch: the report's own authors admit LOC is a terrible quality metric. – Greptile

❝

"We expressly did not conclude that more lines = better. You could easily argue more lines = worse."

The community response was predictably skeptical, with developers calling for metrics like code churn, review iterations, and defect rates instead.

Meanwhile, a CodeRabbit analysis of 470 GitHub pull requests found AI-generated code contains 1.7 times more bugs, logical errors, and security vulnerabilities than human-written code. Algorithmic errors appeared 2.25 times more frequently, and exception-handling gaps doubled. – WebProNews

❝

"AI's tendency to 'hallucinate' non-existent libraries or mismatched versions exacerbates these problems."

🔍 The Code Review Bottleneck

As AI generates more code faster, reviewing it has become the new chokepoint. Several tools launched features this week to address this.

Amp released an agentic code review feature in its VS Code extension that pre-scans diffs, recommends file review order, and provides summaries for large changesets. Powered by Gemini 3 Pro, it delivers deeper analysis than single-shot LLM approaches. – Amp

❝

"There's a big improvement in review quality over the first version of the review panel, which used a single-shot LLM request."

Qodo AI shared their thinking on why smarter models alone won't solve code review. The argument: code generation flows left-to-right (plan, code, test), but review must work in multiple directions, checking intent, risk, regressions, and missing tests. – @itamar_mar

❝

"We need review-native UX built for proof, not vibes."

Qodo also benchmarked 400 real PRs across 100+ repos to select their default reviewers: GPT-5.2 for deep analysis, Gemini 2.5 Pro for stable reviews with low harmful fix rates, and Claude Haiku 4.5 for high-volume, cost-efficient work. – Qodo AI

🚀 Claude Code's Wild Growth

Claude Code creator @bcherny shared staggering numbers: in the last 30 days, he landed 259 PRs with 497 commits, adding 40k lines and removing 38k, all written by Claude Code paired with Opus 4.5. The tool now runs continuously for hours or days using stop hooks. – @bcherny

❝

"Increasingly, code is no longer the bottleneck."

Anthropic also integrated Claude Code with Slack, letting developers trigger coding sessions directly from chat threads. Tag Claude in a conversation with a bug report, and it spins up a session, posts progress updates, and returns a PR link when done. – AI Native Dev

⚠️ The Quality Question

Simon Willison wrote a sharp piece arguing that developers must still deliver code proven to work, regardless of how it was generated. Both manual testing and automated tests remain essential. – Simon Willison

❝

"Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That's no longer valuable."

MIT Sloan Management Review released a video warning that generative AI can create dangerous technical debt that may cripple systems months or years later. Teams need clear guidelines on when AI tools help versus when they introduce risk. – MIT Sloan

Cline introduced an "Explain Changes" feature to combat "AI slop," code that works but degrades clarity and maintainability. The tool generates context-aware inline explanations for every diff change. – Cline

❝

"If the AI can't explain why it did something in plain English, it probably shouldn't have done it."

🛠️ Tool Updates

JetBrains rolled out Next Edit Suggestions across all IDEs for AI subscribers. NES generates diff-style suggestions in the background with sub-200ms latency and doesn't consume your AI quota. – JetBrains

Microsoft previewed C++ code editing tools for GitHub Copilot in Visual Studio 2026 Insiders. The tools give Copilot deep project-wide context including symbol references, inheritance hierarchies, and call chains. – InfoWorld

GitHub Copilot's Agent Mode and multi-model support continue transforming DevOps workflows. The Model Context Protocol acts as a "USB port for intelligence," letting agents tap into database schemas, telemetry, and other context. – DevOps.com

📈 Metrics and Observability

CodeRabbit launched an analytics dashboard focused on what actually matters: what ships and whether it breaks production, rather than vanity metrics like lines of code. – CodeRabbit

AI Native Dev reported that GitHub and Continue are now offering detailed usage metrics for AI coding agents, following the familiar pattern of cloud observability tools maturing alongside enterprise adoption. – AI Native Dev

📚 Quick Hits

A new arXiv paper provides a comprehensive guide to code intelligence, covering everything from data curation to autonomous coding agents
Axify compared 17 AI coding assistants, with GitHub Copilot, Cursor, and Tabnine leading the pack
A quick hack for instant AI code reviews: append ".diff" to any GitHub PR URL and paste into an LLM
Qodo outlined what makes a good code review benchmark, noting no widely accepted standard currently exists

Made with ❤️ by Data Drift Press

Have thoughts on AI code quality or review workflows? Hit reply, we'd love to hear from you!