Claude AI Real-World Testing

News

Anthropic Rolls Out Claude Code ‘Sub-Agents’ to Streamline Complex AI Workflows

Anthropic has rolled out 'sub-agents' for its Claude Code platform, a new feature enabling developers to delegate tasks to ...

16h

GPT-5 could be OpenAI’s most powerful model yet — here’s what early testing reveals

A person who tested GPT-5 told The Information it outperformed Claude Sonnet 4 in side-by-side comparisons. That’s just one ...

23h

Is AI really plotting against us?

Basically, the AI figured out that if it has any hope of being deployed, it needs to present itself like a hippie, not a ...

Live Science on MSN1d

The more advanced AI models get, the better they are at deceiving us — they even know when they're being tested

More advanced AI systems show a better capacity to scheme and lie to us, and they know when they're being watched — so they ...

Anthropic unveils ‘auditing agents’ to test for AI misalignment

In a paper, Anthropic researchers said they developed auditing agents that achieved “impressive performance at auditing tasks, while also shedding light on their limitations.” The researchers stated ...

The Fast Mode1d

Alibaba Launches Qwen3-Coder AI Model for Agentic Programming Excellence

Alibaba has launched Qwen3-Coder, its most advanced agentic AI coding model to date. Designed for high-performance software ...

Two major AI coding tools wiped out user data after making cascading mistakes

New types of AI coding assistants promise to let anyone build software by typing commands in plain English. But when these ...

If AI attempts to take over world, don't count on a ‘kill switch' to save humanity

Attempts to destroy AI to stop a superintelligence from taking over the world are unlikely to work. Humans may have to ...

Meta’s Reminder: The Feedback Loop Is The Real AI Advantage

You don’t need to be Meta to build a data advantage, but you do need to get serious about ground truth operations. The next ...

New Scientist2d

Anthropic AI goes rogue when trying to run a vending machine

Feedback watches with raised eyebrows as Anthropic's AI Claude is given the job of running the company vending machine, and ...

Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber

Anthropic research reveals AI models perform worse with extended reasoning time, challenging industry assumptions about test-time compute scaling in enterprise deployments.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results