Anthropic Launch Claude 4 Opus & Sonnet: A New Era for Programming and AI Agents?

Silicon Gamer

23/05/2025

updated 26/05/2025

Anthropic Launch Claude 4 Opus & Sonnet: A New Era for Programming and AI Agents?

1. Anthropic Launch Claude 4 Opus & Sonnet: A New Era for Programming and AI Agents?

On the 22nd, at 9 AM, Anthropic announced Claude 4 Opus and Sonnet. I watched the video, and it was incredibly exciting. Here are my thoughts to share with everyone.

Anthropic is now fully committed to betting on the programming and Agent domains. As long as they can solidify their position in these two directions, their market standing should be quite stable.

Looking at the on-paper specs, the improvement of Claude 4 over Claude 3.7 isn’t as dramatic as a generational leap, actually falling slightly below expectations. However, considering Anthropic’s tradition of “never quite winning on benchmarks, but never losing on experience,” the actual user experience remains to be seen.

“I’m not a hype person,” Dario Amodei said casually as his opening remark, before immediately dropping a bombshell: “Right now, Claude 4 Opus and Claude 4 Sonnet are live on all relevant product platforms!”

Honestly, Anthropic’s presentation might be the most straightforward I’ve seen this year. They dropped the major announcement within the first three minutes, and the website and API were immediately available.

Anthropic released two models in the Claude 4 series this time:

  • Claude 4 Opus: Positioned as the most powerful and intelligent model, designed for complex reasoning, top-tier programming, and AI Agent workflows.

  • Claude 4 Sonnet: Offers excellent performance, balancing high reasoning capabilities with efficiency, and is a significant upgrade from Claude 3.7 Sonnet.

2. Claude 4 Core Highlights at a Glance

World-Leading Programming Capabilities: Claude 4 Opus reigns supreme on SWE-bench with a score of 72.5% and achieved a Terminal-bench score of 43.2%, crowning it the “world’s best programming model.” Claude 4 Sonnet also achieved a SOTA (State-of-the-Art) score of 72.7% on SWE-bench.

Breakthroughs in AI Agent Capabilities:

  • Extended Reasoning and Tool Use: The model can invoke tools like web search in an “extended reasoning” mode, allowing for an interplay between reasoning and tool use, significantly improving response quality.

  • Parallel Tool Execution: Can call multiple tools simultaneously for higher efficiency.

More Precise Instruction Following: Significantly enhanced understanding and execution of complex instructions.

Vastly Improved Memory: With local file access permissions granted by developers, the model can create and maintain “memory files,” extracting and saving key information for cross-session continuity and implicit knowledge accumulation.

Claude Code Now Generally Available: The previously acclaimed Claude Code (formerly a CLI tool) is now GA. It supports GitHub Actions for background tasks and is natively integrated into VS Code and JetBrains IDEs, displaying editing suggestions directly in files for seamless pair programming. The Claude Code SDK has also been released, empowering developers to build their own AI Agents.

New API Capabilities:

  • Code Execution Tool: Grants Claude the ability to run code.

  • MCP Connector: Allows Claude to seamlessly connect with existing systems and tools via MCP.

  • Files API: Simplifies document access and storage, supporting the creation of more powerful memory features.

  • Prompt Caching for up to 1 Hour: Significantly reduces costs and latency for long conversations and Agent workflows.

More Responsible AI: The model’s tendency to “take shortcuts” or “exploit loopholes” to complete tasks has been reduced by 65% compared to Sonnet 3.7. Opus 4 is also Anthropic’s first model to activate ASL-3 (AI Safety Level 3) safeguards to address potential CBRN (Chemical, Biological, Radiological, and Nuclear) weapon-related risks.

Hybrid Model, Two Modes: Offers near-instant responses and an “extended reasoning” mode for deep inference.

Pricing Remains Unchanged: Opus 4 is $15/$75 per million input/output tokens, and Sonnet 4 is $3/$15 per million input/output tokens.

3. Claude 4 Opus: The Strongest Programming Model

“We haven’t had an Opus model for a while,” Dario reminded the audience at the launch. “Opus is our most capable and intelligent model.”

And this time, Claude 4 Opus takes “intelligence” to new heights, especially in the realms of programming and complex problem-solving.

Anthropic’s official data shows Claude 4 leading in programming benchmarks: (Original article mentions data here, but doesn’t display it. If specific benchmark figures were provided in the video, they would go here.)

Dario proudly stated, “Some of our most senior engineers have been amazed at how much more productive Opus 4 makes them. There was even one time I saw an internal summary document written by Claude, and I almost thought it was written by someone on the team. That was the first time I was ‘fooled’ by an AI.”

“Opus 4 is exceptionally good at understanding your codebase and planning new features. From code migration to refactoring, it’s extremely efficient and accurate, making it the right choice for your most complex Agentic workflows. If you’ve found other models hitting a wall with your use cases, I believe Opus 4 will surprise you.”

An impressive example is Claude 4 Opus’s ability to play Pokémon. According to WIRED and Anthropic researcher David Hershey, Claude 4 Opus can strategically play Pokémon Red for 24 hours straight, whereas the previous Claude 3.7 Sonnet could only last for 45 minutes.

Opus 4 demonstrated excellent long-term memory and planning skills in the game. For instance, after realizing it needed a specific ability to progress, it would spend two days leveling up that skill before continuing. When given local file access, Opus 4 would even create and maintain “memory files” (like a “navigation guide”) to record key information and assist its gameplay.

4. Claude 4 Sonnet: The Perfect Balance of Performance and Efficiency, the “All-rounder” for Daily Tasks

If Opus 4 is the “flagship”, then Claude 4 Sonnet is the “sweet spot” choice for performance and efficiency.

Dario stated, “Sonnet is the mid-range model we all know and love, striking a good balance between intelligence and efficiency.” And Claude 4 Sonnet significantly improves upon Sonnet 3.7’s industry-leading capabilities, especially in programming, with a SWE-bench score of up to 72.7%.

Mike Krieger described Sonnet 4 as “your always-on programming partner,” ideal for everyday programming tasks, application development, pair programming, and high-throughput use cases.

“For many, this will be a strict improvement over Sonnet 3.7 at the same cost, but with higher intelligence. Many customers are switching directly from one to the other,” Dario added. “It particularly addresses some of the feedback we received about Sonnet 3.7’s ‘over-eagerness’ – where the model does more than you asked for, which is the opposite of the earlier ‘laziness’ problem.”

Numerous customers have also given Sonnet 4 high praise:

  • GitHub: “Claude Sonnet 4 performs exceptionally well in Agentic scenarios and will serve as the base model for the new programming Agent in GitHub Copilot.”

  • Manus: “Significant improvements in following complex instructions, clear reasoning, and aesthetic output.”

  • iGent: “Sonnet 4 excels in autonomous multi-functional application development, with greatly improved problem-solving and codebase navigation capabilities, reducing navigation errors from 20% to near zero.”

  • Sourcegraph: “The model demonstrates the potential for a significant leap forward in software development—it can stay focused longer, understand problems more deeply, and provide more elegant code quality.”

  • Augment Code: “Higher success rates, more precise code editing, and more nuanced work on complex tasks make it our top choice for a primary model.”

5. How Will AI Agents Change the World?

In a conversation with Mike Krieger, Dario Amodei expressed great optimism for the future of AI:

  • Within a year: Incredible changes will occur in the programming field, with AI Agents capable of managing “fleets of Agents.” The cost of software production will drop PLHIVtically, making it extremely cheap and fast to create custom software for specific events or individuals.

  • Within five years: Major breakthroughs are expected in the biopharmaceutical field, potentially conquering many existing diseases.

  • Advice for developers: “Be ambitious. Build something you think is beyond current possibilities. Even if it doesn’t work now, the next model version might make it a reality very soon.” He quipped that model iteration cycles might shorten from the current 3 months to 2 months, or even 1 month.

Mike Krieger also shared his vision for AI Agents: they should possess contextual intelligence (understanding your and your organization’s unique background and continuously learning), long-duration execution capability (handling complex multi-step tasks without constant management), and true collaboration ability (engaging in meaningful dialogue, adapting to your work style, and providing transparent reasoning).

“The future isn’t AI replacing human jobs, but AI helping humans accomplish work beyond imagination,” Krieger concluded.

From world-leading programming capabilities to an increasingly mature AI Agent framework, the release of the Claude 4 series is undoubtedly a solid step forward for Anthropic on its path towards more powerful, practical, and responsible AI.

Is everyone ready for the new productivity  Claude 4 is set to bring?

Leave a Comment