#34 - 🔬 What 50 AI researchers learned

Hey readers! 👋 This week we're diving into a massive 300-page research paper from 50 AI researchers that's packed with surprising insights about how coding models actually learn, plus some notable model releases and practical guidance on getting the most out of your AI tools.

📰 This Week's Highlights

A 300-page study from researchers at Bytedance, Alibaba, Tencent, and other labs reveals some counterintuitive findings about coding models. Small LLMs trained with RLVR on verified problems can match OpenAI's o3 in reasoning tasks. Python is actually harder for models to learn than statically typed languages like Java or C#, and mixing Python into training for other languages can hurt performance. The paper also confirms that models trained on public repos inherit years of insecure coding patterns, and safety fine-tuning often fails to fix this. Perhaps most interesting: when training on chain-of-thought reasoning, the structure of the reasoning matters more than whether every intermediate fact is correct. – Hesamation

DeepSeek has released two 685-billion-parameter open-weight models under an MIT license. DeepSeek-V3.2 is their flagship, trained with extensive reasoning and human-alignment data, while the experimental Speciale variant focuses purely on reasoning and earned a gold rating on the 2025 International Mathematical Olympiad. Simon Willison tested it with an SVG generation task that took 10 minutes of deliberation, producing a surprisingly competent pelican riding a bicycle. – Simon Willison

Cursor 2.0 introduces the fast-acting Composer model with multi-agent support and codebase-wide semantic search. The IDE now supports up to eight agents running in parallel with combined diff views, and pricing ranges from free to $200/month for the Ultra tier. – The New Stack

Anthropic's approach to keeping AI agents focused on long projects uses a two-agent system where an initializer sets up a feature list and progress log, then a coding agent reads the log, picks one unfinished feature, implements it, commits, and updates the log. The key insight: better scaffolding may matter more than larger models for long-running agents. – purealgo

🔬 Research & Analysis

Researchers at Saarland University found that humans and LLMs get confused at the same spots when reading tricky code. By comparing EEG brain activity with LLM perplexity scores, they discovered strong correlations and built an algorithm that automatically flags confusing code patterns, identifying over 150 previously unrecognized ones. – TechXplore

A Techreviewer study of senior developers across 19 countries found 85% report higher productivity with AI, but only 18% fully trust the accuracy. 62% manually verify code, and complex tasks like architecture planning remain largely human-led. – BetaNews

The AI coding quality crisis that Qodo highlights: we're generating 3x more code but spending 90% more time reviewing it and 40% more time fixing bugs. 88% of developers don't trust the context their LLM has. The proposed solution is parallel quality agents running tests and validation alongside generation, not after. – TheTuringPost

🛠️ Tools & Releases

All Hands explores the distinction between inner and outer loop agents. Inner loop agents run locally in IDEs for a 40-50% productivity gain, while outer loop agents operate in the cloud for scalable automation like CVE remediation. The most productive teams combine both approaches. – All Hands

Augment for VS Code now includes Prompt Completions, providing context-aware suggestions as you type prompts, including natural-language completions, variable names, and file references. IntelliJ support is coming soon. – @augmentcode

Claude Code is now a fully native IDE experience in Eclipse Theia, with deep AI agent integration patterns that can be reused for any external AI agent. – @theia_ide

📚 Guides & Best Practices

Philipp Schmid's Gemini 3 prompting guide emphasizes clarity and structure. Key recommendations: use concise instructions, consistent prompt formats with XML or Markdown, place behavioral constraints in system instructions, and require explicit planning and self-review before final answers. For multi-step workflows, prompts should enforce persistence, risk assessment, and proactive planning. – Philipp Schmid

JetBrains and Nebius launched a 10-part AI-Assisted Programming course covering practical use of AI for refactoring, debugging, DevOps, and agent workflows. The curriculum includes 25 hands-on tasks and a capstone project. – Developer Tech

DevOps.com covers how DevSecOps practices can secure AI-generated code, recommending automated security testing, policy-as-code, software bills of materials, and zero-trust approaches to catch vulnerabilities before production. – DevOps.com

📝 Quick Bits

Doing code review on 10,000 lines Claude wrote - a glimpse into large-scale AI code assessment – nearcyan
Claude Opus 4.5's SWE-bench score continues to impress, with Anthropic staying focused on coding over image/video models – @Yuchenj_UW
CodeRabbit outperformed Claude code and Codex in one developer's code review comparison – ferran9908
Realistic AI productivity gains hover around 20% under optimal conditions, with developer adaptation being the true differentiator – Joshua Berkowitz

Made with ❤ by Data Drift Press

Have questions, comments, or feedback? Hit reply - we'd love to hear from you!

#34 - 🔬 What 50 AI researchers learned

📰 This Week's Highlights

🔬 Research & Analysis

🛠️ Tools & Releases

📚 Guides & Best Practices

📝 Quick Bits

Keep Reading

AI Coding Weekly

Home