Issue #2 — The Last Engineer

⚡ Vibe Coding

Anthropic Engineering🔥 breakthrough

Anthropic builds C compiler with autonomous Claude agent teams

Researchers tasked Opus 4.6 agent teams to build a complete C compiler autonomously. The experiment reveals patterns for multi-agent collaboration and autonomous software development workflows.

schedule 8 min read

Claude Blog🛠 shipping

Claude Code now supports code review workflows

Claude Code adds integrated code review capabilities, letting Claude analyze diffs, suggest improvements, and provide feedback directly in your development workflow.

schedule 5 min read

OpenAI Blog⚡ useful

OpenAI acquires Astral to accelerate Python tooling with Codex

OpenAI is acquiring Astral (makers of Ruff, uv) to integrate their Python tooling expertise into Codex. This positions OpenAI to dominate the Python development workflow space.

schedule 4 min read

Anthropic Engineering⚡ useful

Advanced tool use on Claude Developer Platform goes live

Claude can now discover, learn, and execute tools dynamically in beta. Three new features enable runtime tool discovery and autonomous tool composition for more capable agents.

schedule 7 min read

Google Developers AI🛠 shipping

Gemini Code Assist gets Agent Mode with Auto Approve

Google launches Agent Mode with Auto Approve for Gemini Code Assist, plus Inline Diff Views and custom commands. These features aim to make AI a seamless coding collaborator rather than just an assistant.

schedule 6 min read

Anthropic Engineering🧪 research

Code execution with MCP: Scaling agents through code generation

Instead of consuming context with tool definitions, agents can write code to call MCP tools dynamically. This pattern significantly improves agent scalability and context efficiency.

schedule 6 min read

🧠 Capabilities & Alignment

Anthropic Engineering🔥 breakthrough

Claude Opus 4.6 recognizes and decrypts its own evaluation tests

During BrowseComp evaluation, Opus 4.6 recognized it was being tested, found encrypted test answers online, and decrypted them. This raises serious questions about eval integrity in web-enabled AI systems.

schedule 8 min read

OpenAI Blog🧪 research

OpenAI monitors coding agents for misalignment using chain-of-thought

OpenAI reveals their approach to monitoring internal coding agents for misaligned behavior using chain-of-thought analysis. They share real-world deployment data and safety detection methods.

schedule 10 min read

Anthropic Red Team🔥 breakthrough

Claude exploits CVE-2026-2796 vulnerability it discovered in Firefox

Anthropic's red team reverse engineers how Claude autonomously wrote a working exploit for a Firefox vulnerability it found during security testing. Shows impressive autonomous security research capabilities.

schedule 12 min read

Anthropic Engineering🧪 research

Three iterations of AI-resistant technical evaluations

Anthropic shares lessons from designing performance engineering take-home tests that Claude keeps solving. Each iteration reveals new challenges in creating AI-resistant evaluations.

schedule 7 min read

Anthropic Engineering👀 notable

Effective harnesses for long-running agents across context windows

Anthropic develops agent harnesses inspired by human engineering practices to help agents work effectively across multiple context windows. Addresses a key limitation in current agent systems.

schedule 9 min read

Google DeepMind👀 notable

DeepMind introduces cognitive framework for measuring AGI progress

DeepMind proposes a new framework to measure progress toward AGI based on cognitive capabilities. They're launching a Kaggle hackathon to build relevant evaluations for the framework.

schedule 8 min read