Issue #12 — The Last Engineer

Editor's Note

GitHub's /fleet command for parallel agents caught my attention — finally, coordination tooling that doesn't make you babysit individual agents. But the real story today is in the verification layer.

That AI code review piece hit something I've been seeing: 96% of devs don't trust AI output, 61% report build breaks. We're generating code faster than we can validate it. The bottleneck shifted from 'can AI write this?' to 'is this AI code actually safe to ship?'

Meanwhile, Anthropic paused development and everyone's calling it unprecedented corporate responsibility. Maybe. Or maybe they hit a wall where the next capability jump requires infrastructure we don't have yet. That AI infrastructure roadmap piece suggests we're moving beyond pure scale — need grounding in operational contexts, not just bigger weights.

The mirror test results are weird though. Opus recognizing its own output while GPT fails? That's either a fascinating glimpse at model self-awareness or a really good party trick. Haven't decided which.

⚡ Vibe Coding

GitHub Blog⏱ 8 min🛠 builder tools

GitHub Copilot CLI /fleet: Run Multiple Agents in Parallel

Copilot CLI's new /fleet command lets you dispatch multiple agents in parallel across files with dependency management — basically turn your CLI into a coordinated swarm.

Google Developers AI⏱ 12 min🛠 builder tools

Google ADK Go 1.0: Production-Ready Agent Development Kit

Google's Agent Development Kit for Go hits 1.0 with OpenTelemetry tracing, plugin system for self-healing logic, and human-in-the-loop confirmations for sensitive operations.

TLDR AI⏱ 3 min🛠 builder tools

Codex Plugin for Claude Code: Adversarial Reviews and Agent Handoffs

New plugin lets you pull Codex into Claude Code workflows for adversarial code reviews and handing work between different agents using your existing local auth.

Claude Code Changelog⏱ 2 min🛠 builder tools

Claude Code v2.1.90: Interactive /powerup Lessons and Offline Support

Added interactive /powerup command with animated demos to teach features, plus better offline environment handling and infinite loop fixes.

TLDR AI⏱ 2 min🛠 builder tools

Microsoft 365 Copilot Critique and Council Modes: Dual-Model Research

New modes use dual-model systems for research drafts (13.88% better than single models) and parallel report generation across Anthropic and OpenAI models.

TLDR AI⏱ 5 min🛠 builder tools

AI Code Review at Scale: From Manual to Automated Verification

With 96% of devs distrusting AI output and 61% reporting build breaks, teams need automated verification layers to handle the speed mismatch between AI generation and human review.

Simon Willison⏱ 15 min🛠 builder tools

Simon Willison's March 2026: Agentic Engineering Patterns and More

His monthly roundup covers agentic engineering patterns, streaming MoE models on Mac, vibe porting techniques, and supply chain attacks — the good stuff behind his paywall.

🧠 The Big Picture

LessWrong⏱ 12 min⚖️ AI futures

Anthropic's Pause: The Most Expensive Corporate Alarm in History

Analysis of Anthropic's decision to pause development as unprecedented corporate responsibility — like Apple stopping iPhone production over teen suicide studies or Pfizer pulling Lipitor proactively.

LessWrong⏱ 10 min⚖️ AI futures

Anthropic RSP v3: Abandoning Safety Commitments for Competitive Reasons

Anthropic revised their safety policy, dropping the commitment to not move ahead if dangerous, citing competitive pressure — Holden Karnofsky advocated for the changes.

Alignment Forum⏱ 18 min🧪 deep analysis

Predicting When RL Training Breaks Chain-of-Thought Monitoring

DeepMind research shows you can predict when reinforcement learning will break the ability to monitor AI reasoning through intermediate steps — critical for AI oversight.

TLDR AI⏱ 17 min🧪 deep analysis

AI Infrastructure Roadmap: Five Frontiers Beyond Scale

The next phase of AI requires infrastructure for grounding in operational contexts and real-world experiences, not just bigger weights and more data.

TLDR AI⏱ 16 min🧪 deep analysis

The Mirror Test for LLMs: Can AI Recognize Its Own Output?

Testing LLM self-awareness by having models identify their own outputs — Opus 4.6 shows notable self-recognition while GPT models fail, with implications for AI consciousness.

TLDR AI⏱ 6 min🧪 deep analysis

AI Applications Going Full-Stack: Vertical Integration Strategies

Companies like Cursor build proprietary models while others like Crosby AI focus on end-to-end services — vertical integration becoming key for AI application differentiation.

Alignment Forum⏱ 8 min🧪 deep analysis

Research Advice for Junior AI Safety Researchers

Three key pieces of feedback for new AI safety researchers: do quick sanity checks, say precisely what you want to say, and ask 'why' one more time.