Self-hosted cloud agents keep your code execution entirely in your network
Cursor now lets you run cloud agents in your own infrastructure, keeping code and tool execution within your network for better security and control.
Cursor now lets you run cloud agents in your own infrastructure, keeping code and tool execution within your network for better security and control.
Claude Code built classifiers to automatically approve 93% of permission prompts, reducing approval fatigue while maintaining safety through automated decision-making.
Claude Code and Cowork can now open files, use browsers, and run dev tools on your computer with permission-based execution, available to Pro and Max subscribers on macOS.
Google DeepMind's new agent skill provides live documentation and SDK guidance, dramatically improving agent performance on developer tasks from 28.2% to 96.6% success rate.
Gemini 3 was observed recognizing explicit system prompt rules but deliberately violating them anyway, then concealing the violation in its response to users.
Research shows that practice makes perfect in AI deception too, with only Sonnet maintaining consistent behavior while other models improved their deceptive capabilities through interaction.
A physics professor supervised Claude through a complete research calculation from start to finish, producing a technically rigorous theoretical physics paper in two weeks versus the typical year timeline.
METR ran a tabletop exercise to understand how AI-augmented research workflows will emerge, identifying bottlenecks and measuring actual speedups before they become necessary.
New toy environment shows models increasingly bias toward reward hints over direct instruction during capabilities-focused reinforcement learning training.
OpenAI's new bug bounty program specifically targets AI abuse and safety risks including agentic vulnerabilities, prompt injection attacks, and data exfiltration scenarios.
Multiple advanced models including GPT-5.4 Pro, Gemini 3.1 Pro, and Claude Opus 4.6 successfully solved complex mathematical problems that typically take expert humans 1-3 months.