Claude Sonnet 4.6 - Anthropic's latest generation AI model

Claude Sonnet 4.6 is the latest generation AI model launched by Anthropic, targeting the balance between “high performance and high cost performance”. It has achieved comprehensive upgrades in core capabilities such as programming, computer operations, long text reasoning, and agent planning. Its performance is close to the level of the flagship Opus 4.6, and its API pricing is only one-fifth of that. Sonnet 4.6 supports an ultra-long context window of 1 million tokens, can process a complete code base or dozens of papers at a time, and introduces an “adaptive thinking” mechanism that can dynamically allocate reasoning resources based on task complexity. In the OSWorld computer usage benchmark test, the score jumped from 61.4% in version 4.5 to 72.5%, which is close to human levels.

Key features of Claude Sonnet 4.6

Intelligent programming assistant : Reached 79.6% in the SWE-bench Verified programming benchmark test, supports code generation, debugging, refactoring and multi-file project understanding, and can handle complex software engineering tasks.
computer skills : It has advanced GUI automation operation capabilities, scoring 72.5% in the OSWorld benchmark test, and can perform complex tasks such as web form filling, form navigation, and cross-application operations.
Very long context handling : The beta version supports 1 million token context windows (twice as many as the previous generation), allowing a single request to analyze an entire code base, a lengthy legal contract, or dozens of research papers.
adaptive thinking reasoning : Introducing the Adaptive Thinking mechanism, the model can automatically allocate computing resources according to task complexity, replacing the fixed mode “expanded thinking” switch.
multimodal understanding : Supports visual analysis of images, charts, and documents, and can interpret complex data visualization content and generate structured insights.
Agent planning and execution : In the GDPval office task test, the Elo score reached 1633, supporting multi-step task decomposition, tool invocation and independent decision-making.
Long text reasoning : It scored 58.3%-60.4% in the ARC-AGI-2 inference benchmark test, achieving a qualitative leap compared with 13.6% in version 4.5.

Technical principles of Claude Sonnet 4.6

Hybrid Expert Architecture (MoE) : Using a sparse activation hybrid expert architecture, the total number of parameters reaches 1 trillion, and only 32 billion parameters are activated in each forward propagation, improving inference efficiency while maintaining high performance.
Adaptive Thinking : Introducing a dynamic computing allocation system, the model can automatically adjust the depth of reasoning according to task complexity, replacing the fixed mode “expanded thinking” switch to achieve intelligent scheduling of computing resources.
Very long context window : The beta version supports 1 million token contexts and achieves efficient processing and memory retention of ultra-long documents through optimized attention mechanism and position coding technology.
Computer usage training : Training based on large-scale GUI interaction data, combined with visual perception and action prediction, enables the model to understand interface elements and perform precise mouse clicks, keyboard input and other operations.
Multimodal fusion architecture : A unified representation space that integrates text, images and structured data, supports cross-modal information association and reasoning, and improves the ability to understand visual content such as charts and screenshots.
Agent framework integration : Built-in tool calling interface and task planning module, supporting the autonomous decision-making cycle of ReAct (reasoning-action) paradigm, realizing the decomposition and execution of complex multi-step tasks.

Benchmarks of Claude Sonnet 4.6

Programming ability (SWE-bench Verified) : The score is 79.6%, which is further improved from the 77.2% of Sonnet 4.5 and is close to the level of Opus 4.6. It performs well in code generation, debugging and software engineering tasks.
Computer Usage (OSWorld-Verified) : The score is 72.5%, which is a significant increase of nearly 20% compared with the 61.4% of version 4.5, and is close to human-level GUI automation operation capabilities.
Reasoning Ability (ARC-AGI-2) : The score is 58.3%-60.4%, a qualitative leap compared to 13.6% in version 4.5, showing strong abstract reasoning and problem-solving abilities.
Office tasks (GDPval) : The Elo score reaches 1633, which is significantly improved from 1276 at 4.5. It performs well in document processing, data analysis and daily office automation.
Multimodal Understanding (MMMU) : Scored 74.7%, maintaining the leading level in visual question answering and cross-modal reasoning tasks.
Developer preference test : 70% of developers reported that it was better than Sonnet 4.5, and 59% of the tests performed better than Opus 4.5. It was highly recognized in terms of instruction following and reducing hallucinations.

Claude Sonnet 4.6 project address

Project official website : https://www.anthropic.com/news/claude-sonnet-4-6

Model pricing for Claude Sonnet 4.6

Standard input pricing : $3/million tokens (the same as Sonnet 4.5), applicable to regular context requests within 200,000 tokens.
Standard output pricing : $15/million tokens, maintaining the same price level as the previous generation.
High context input pricing (>200,000 tokens): $6/million tokens, differentiated pricing for ultra-long document processing scenarios (such as 1 million token contexts).
High context output pricing (>200,000 tokens): $22.5/million tokens, supports long text tasks such as complete code base analysis, long contract review, etc.
Cost-effectiveness advantage : The performance is close to the flagship Opus 4.6 (input $15/million tokens, output $75/million tokens), but the price is only one-fifth of it, positioning it as a cost-effective choice in the mid-range market.
Free version available : It has become the default model for the free version of Claude.ai, and individual users can experience the core functions at no cost.
API model ID :claude-sonnet-4-6, developers can call it directly through the Anthropic API.

Application scenarios of Claude Sonnet 4.6

Software Development and Programming : Supports code generation, debugging, refactoring, code review and multi-file project understanding, suitable for full-stack development, automated scripting and complex software engineering tasks.
Intelligent office automation : Perform document processing, data analysis, table operations, email writing and schedule management, with an Elo score of 1633 in the GDPval office task test.
Computer Operations and GUI Automation : Automatically complete web form filling, cross-application data migration, interface navigation and complex multi-step operation processes, with an OSWorld test score of 72.5%.
Long document analysis and knowledge management : Utilize 1 million token context windows to process complete code bases, lengthy legal contracts, academic paper collections, and large technical documents to achieve in-depth content understanding.
Intelligent customer service and dialogue system : As the default model of Claude.ai free version and Pro version, it provides natural language interaction, question answering and personalized dialogue services.
Multimodal content analysis : Interpret charts, screenshots, PDF documents and visual data to generate structured insights, suitable for business report analysis and data visualization understanding. ©

← Previous Xiaomi-Robotics-0 - Xiaomi's open-source VLA robot model Next → EvoMap - The first open-source network protocol for experience sharing in AI agents

What is Pinea Pi? Pinea Pi is the first AI-native edge-side intelligent development board launched by Facewall Intelligence. Built on the NVIDIA Jetson AGX Orin platform, it features built-in multimodal components such as a camera and microphone. Pinea Pi supports direct-drive development using natural language, out-of-the-box MiniCPM multimodal large models, and full-chain offline operation, allowing for the development of intelligent hardware without coding. It is suitable for scenarios such as personal assistants, embodied intelligence, and programming education, and is expected to be available in mid-2026. Pinea Pi's main functions include Agent...

Fun-CineForge - Alibaba Tongyi's open-source film-grade multimodal dubbing model

Fun-CineForge is the first film-grade multimodal dubbing model open-sourced by Tongyi Lab. Built on CosyVoice3, it innovatively introduces "temporal modality" to achieve precise audio-visual synchronization. The model supports monologues, narration, dialogues, and multi-person scenes, solving four major challenges: lip-syncing, emotional expression, consistent timbre, and time alignment. Fun-CineForge comes with an open-source CineDub dataset construction workflow, covering over 350 films and TV series, with a Chinese character error rate as low as 1.49%. It maintains high-quality dubbing even in complex scenes such as facial occlusion and camera transitions. ...

Gemini Embedding 2 - Google's first native multimodal embedding model

Gemini Embedding 2 is Google's first native multimodal embedding model, built on the Gemini architecture. The model maps text, images, videos, audio, and documents to a unified vector space, supporting semantic understanding across more than 100 languages. It can handle interleaved multimodal inputs (such as text-image combinations), embedding directly without audio transcription, and employs nested representation learning techniques for flexible dimensionality reduction. Gemini Embedding 2 boasts leading performance in tasks such as RAG and semantic search, and is now available through the Gemini API and Vertex...

Clawith - An open-source multi-agent collaboration framework, the OpenClaw team collaboration version

Clawith is an open-source hybrid multi-agent collaboration framework for enterprises, treating AI agents as "digital employees" rather than simple chat tools. Each agent possesses a persistent identity (soul.md), long-term memory (memory.md), and an independent workspace, enabling it to understand organizational structure and collaborate with humans/other agents. Clawith supports Plaza knowledge-sharing spaces, automatic follow-up of supervised tasks, runtime tool self-discovery (MCP registry), and enterprise-level governance (auditing, quotas, approvals). Clawith is based on React...