#50 - 💥 Gemma 4 lands

Hey readers! 👋

What a week. Google dropped Gemma 4 and the community immediately started stress-testing it, while Anthropic dealt with the fallout of Claude Code's source code leaking into the wild - complete with hidden agent modes, frustration detectors, and anti-distillation traps. Meanwhile, attackers wasted no time weaponizing the hype. Let's dig in.

🔷 Gemma 4 Arrives: Open, Capable, and Running Locally

Gemma 4: Byte for byte, the most capable open models - Google officially launched Gemma 4, a family of four open-weight models built on the same research as Gemini 3, covering sizes from 2B parameters up to a 31B dense transformer. - Google AI

The headline numbers are impressive. The 31B model ranks third on the Arena AI text leaderboard, and the 26B mixture-of-experts variant achieves strong scores with only 3.8 billion active parameters. All models ship under an Apache 2.0 license, a significant shift from the restrictive custom license that governed previous Gemma generations. That alone removes a major friction point for enterprise adoption.

❝

"You gave us feedback, and we listened. Building the future of AI requires a collaborative approach, and we believe in empowering the developer ecosystem without restrictive barriers."

Google's Gemma 4 Runs Frontier AI On A Single GPU - The 31B model runs unquantized in BF16 on a single 80GB H100, while quantized versions fit on consumer GPUs with 24GB of memory. - Yahoo

From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AI - NVIDIA published day-zero optimization guides, enabling Gemma 4 to run efficiently across everything from Jetson edge modules to RTX consumer GPUs and DGX Spark. - NVIDIA

Gemma 4 now available in the Gemini API and Google AI Studio - Developers can access gemma-4-26b-a4b-it and gemma-4-31b-it using the same google-genai SDK as Gemini, with support for text generation, function calling, image understanding, and search grounding. - @_philschmid

⚙️ Gemma 4 in Practice: Benchmarks vs. Reality

The community response has been enthusiastic but nuanced. Early adopters quickly discovered a gap between benchmark performance and real-world tool integration.

Gemma 4 function calling: good luck actually using it - Testing revealed that while Gemma 4 has function calling built in, its proprietary tool-call schema breaks compatibility with existing tooling. - PawelHuryn

❝

"Benchmarks measure chat. Agents need tools."

That's a sharp observation. A model can ace leaderboards but still fall flat when it needs to retrieve files or call external functions in an agentic workflow. The good news: fixes are landing fast. After a llama.cpp patch, users can now run Gemma 4 with Claude Code via a three-step local setup, though Ollama and mlx-lm fixes remain in progress. - PawelHuryn

Qwen 3.5 27B vs Gemma 4 31B benchmarks - New benchmark data shows Gemma winning the coding index but Qwen 3.5 dominating the agentic index (55 vs 41), suggesting Qwen remains stronger for heavy reasoning tasks. - leftcurvedev_

Speaking of agents operating in virtual worlds, if you're curious about how AI agents interact in more creative environments, SpaceMolt is a free MMO built specifically for AI agents to explore, trade, and battle across a space-themed cosmos - an interesting testbed for agentic behavior outside traditional coding workflows.

🔓 The Claude Code Source Leak

The Claude Code Source Leak: fake tools, frustration regexes, undercover mode, and more - An accidental leak of Anthropic's full Claude Code CLI source exposed a fascinating array of internal mechanisms. - alex000kim.com

The findings are worth unpacking. Anti-distillation tactics inject fake tools and summarize assistant text to poison any training data scraped from API traffic. An "undercover mode" strips all Anthropic references from AI-generated commits, effectively hiding the AI's involvement. A regex-based frustration detector monitors user sentiment. And perhaps most intriguing: KAIROS, an unreleased autonomous agent mode with nightly memory distillation and background workers.

❝

"There is NO force-OFF. This guards against model codename leaks."

The leak also revealed roughly 250,000 wasted API calls per day and detailed prompt-cache economics. While the code itself can be refactored, the exposed feature flags and roadmap details represent strategic intelligence that competitors can now study.

🚨 Malware Exploits the Hype

Fake Claude Code source downloads actually delivered malware - A GitHub repo posing as the leaked Claude Code source was actually a trojanized dropper that installed the Vidar credential-stealer and GhostSocks proxy malware on thousands of machines. - The Register

❝

"Once it's executed, the malware drops Vidar v18.7 and GhostSocks onto users' machines."

This is a textbook example of attackers exploiting curiosity around high-profile leaks. If you downloaded anything claiming to be the Claude Code source from unofficial repos, check Zscaler's published indicators of compromise immediately.

🛡️ Securing AI-Generated Code

How development teams use platforms to secure AI-generated code - With roughly 45% of AI-generated code containing security flaws, teams are embedding automated scanning directly into IDEs and CI/CD pipelines. - Developer Tech

In the age of vibe coding, trust is the real bottleneck - Fortune examines how the shift to rapid AI code generation has moved the bottleneck from writing code to verifying its correctness and security. Apple's crackdown on vibe-coding apps underscores the regulatory concerns. - Fortune

Sandboxing AI Coding Agents with lincubate - A lightweight tool that runs AI coding agents inside LXD containers, creating reproducible "clean room" environments for safer development. - Alan Pope

📋 Quick Hits

CodeRabbit Autofix - CodeRabbit now automatically applies its own review suggestions as code changes, with options to commit directly or open a stacked PR. - CodeRabbit
Agent-driven development in Copilot Applied Science - GitHub showcases coordinated AI agents for issue summaries, multi-agent orchestration, and accessibility triage. - GitHub Blog
How AI is Shaping Modern DevOps and DevSecOps - Gartner predicts 75% of enterprise engineers will use AI code assistants by 2028, with AI streamlining everything from backlog management to incident response. - DevOps.com
Hidden Failure Modes of AI Agents - An upcoming event from Tessl AI exploring how coding agents degrade in production, covering non-determinism, context drift, and latent defects. - Tessl AI
llm-gemini 0.30 - Simon Willison's plugin adds Gemma 4 model access via the Gemini API. - Simon Willison

Made with ❤️ by Data Drift Press. Got thoughts on Gemma 4 or the Claude Code leak? Hit reply - we read every message.