GitHub Copilot CLI /fleet: Run Multiple Agents in Parallel
Copilot CLI's new /fleet command lets you dispatch multiple agents in parallel across files with dependency management — basically turn your CLI into a coordinated swarm.
GitHub's /fleet command for parallel agents caught my attention — finally, coordination tooling that doesn't make you babysit individual agents. But the real story today is in the verification layer.
That AI code review piece hit something I've been seeing: 96% of devs don't trust AI output, 61% report build breaks. We're generating code faster than we can validate it. The bottleneck shifted from 'can AI write this?' to 'is this AI code actually safe to ship?'
Meanwhile, Anthropic paused development and everyone's calling it unprecedented corporate responsibility. Maybe. Or maybe they hit a wall where the next capability jump requires infrastructure we don't have yet. That AI infrastructure roadmap piece suggests we're moving beyond pure scale — need grounding in operational contexts, not just bigger weights.
The mirror test results are weird though. Opus recognizing its own output while GPT fails? That's either a fascinating glimpse at model self-awareness or a really good party trick. Haven't decided which.
Copilot CLI's new /fleet command lets you dispatch multiple agents in parallel across files with dependency management — basically turn your CLI into a coordinated swarm.
Google's Agent Development Kit for Go hits 1.0 with OpenTelemetry tracing, plugin system for self-healing logic, and human-in-the-loop confirmations for sensitive operations.
New plugin lets you pull Codex into Claude Code workflows for adversarial code reviews and handing work between different agents using your existing local auth.
Added interactive /powerup command with animated demos to teach features, plus better offline environment handling and infinite loop fixes.
New modes use dual-model systems for research drafts (13.88% better than single models) and parallel report generation across Anthropic and OpenAI models.
With 96% of devs distrusting AI output and 61% reporting build breaks, teams need automated verification layers to handle the speed mismatch between AI generation and human review.
His monthly roundup covers agentic engineering patterns, streaming MoE models on Mac, vibe porting techniques, and supply chain attacks — the good stuff behind his paywall.
Analysis of Anthropic's decision to pause development as unprecedented corporate responsibility — like Apple stopping iPhone production over teen suicide studies or Pfizer pulling Lipitor proactively.
Anthropic revised their safety policy, dropping the commitment to not move ahead if dangerous, citing competitive pressure — Holden Karnofsky advocated for the changes.
DeepMind research shows you can predict when reinforcement learning will break the ability to monitor AI reasoning through intermediate steps — critical for AI oversight.
The next phase of AI requires infrastructure for grounding in operational contexts and real-world experiences, not just bigger weights and more data.
Testing LLM self-awareness by having models identify their own outputs — Opus 4.6 shows notable self-recognition while GPT models fail, with implications for AI consciousness.
Companies like Cursor build proprietary models while others like Crosby AI focus on end-to-end services — vertical integration becoming key for AI application differentiation.
Three key pieces of feedback for new AI safety researchers: do quick sanity checks, say precisely what you want to say, and ask 'why' one more time.