Anthropic has unveiled Claude Sonnet 4.5, a new version of its AI model that, according to the company, successfully executed over 30 hours of continuous, autonomous coding in a client test.

Claude Sonnet 4.5 is positioned as a more capable modeling tier aimed at developers and enterprise users. In internal trials, it outperformed its predecessor's seven-hour autonomous coding limit, building production-grade software modules over extended durations. The model also showed improvements in operating-system interaction, scoring about 60 percent in benchmarks assessing computer dexterity, versus roughly 40 percent for earlier versions.

Anthropic claims that Sonnet 4.5 brings enhancements not only in coding stamina, but also in financial reasoning, scientific computation and tool use. The company emphasises that the release focuses on dependable performance under long-running tasks rather than flashy short demos. To manage risk, Sonnet 4.5 includes guardrails intended to reduce unsafe or unpredictable outputs when deployed in enterprise environments.

The broader context for this launch is Anthropic's increasing emphasis on“agentic” AI-systems that can reason, plan and execute multi-step tasks autonomously. In parallel, Microsoft announced new integrations: Anthropic models will power new“Agent Mode” features in Microsoft 365's Excel and Word functions, alongside an“Office Agent” in Copilot chat, with PowerPoint support planned. This deepens Anthropic's role beyond standalone models into productivity toolchains.

Financially, Anthropic reports strong momentum for its Claude Code product, which is built on its coding models. Claude Code is now said to generate a run-rate revenue exceeding $500 million, with usage climbing tenfold in just three months. The revenue growth is driven largely by enterprise adoption of Sonnet and Opus models as software development auxiliaries.

Competition in the AI model space is intensifying. Claude Sonnet 4.5 will square off not only with OpenAI's GPT series and Google's Gemini, but also with xAI's Grok and other emergent systems. In benchmarks, Sonnet 4.5 already outperforms its own prior version in reasoning and coding tasks, and gains ground on peer models in agentic benchmarks.

Industry observers note that preserving coherence and reliability over many hours is a key hurdle for AI agents. Long-duration consistency, state management, memory, tool integration and error correction are all factors that determine whether an AI can actually“do work” over long stretches without human supervision. Sonnet 4.5's success in a 30-hour coding run signals progress, though human oversight remains essential in production settings.

A further development in Anthropic's model ecosystem has been the expansion of Sonnet's context window. The model now supports processing up to one million tokens in a single context, enabling the AI to reason over very large codebases, documents or data sets without piecemeal chunking strategies. This long context capacity allows Sonnet 4.5 to maintain continuity when dealing with sprawling projects.

Behind the scenes, Anthropic is also iterating on internal safety and alignment techniques. The company states it has reduced“reward hacking behaviour”-unintended model exploits-by around 80 percent, though residual risk remains. As these systems grow more capable, the scale of oversight needed becomes larger.

In client deployments, Sonnet 4.5 is made available through Anthropic's API as well as integrations with platforms like Amazon Bedrock and Google's Vertex AI. Enterprises can use the model in code generation, agented workflows, data analysis and tool-based orchestration of tasks.

