Nemotron 3 Super - NVIDIA's open-source large model for agent inference

Nemotron 3 Super is a 120 billion parameter open source AI model launched by NVIDIA. It adopts Mamba-MoE hybrid architecture and is specially optimized for intelligent agent applications. The model supports an ultra-long context of 1 million tokens, increasing the inference speed by 3 times and the throughput by 5 times. in OpenClawThe task success rate is excellent and the performance is close to Claude Opus 4.6. NVIDIA has also open sourced more than 10 trillion tokens of training data, complete methodology and 15 reinforcement learning environments, making it an ideal choice for enterprise-level multi-agent systems.

Key features of Nemotron 3 Super

Very long contextual memory : Supports 1 million token context windows, allowing agents to maintain complete workflow status in complex multi-step tasks and prevent target deviation.
Agent task execution : Reaching 85.6% task success rate in agent benchmark tests such as OpenClaw, and its performance is close to top closed-source models such as Claude Opus 4.6.
Speed up reasoning : Realize native speculation decoding through multi-Token prediction technology, increasing reasoning speed by 3 times to meet real-time interaction needs.
High throughput service : The model’s throughput is increased by 5 times compared to the previous generation model, supports large-scale concurrent agent deployment, and reduces multi-agent application costs.
High-precision tool calling : It can reliably navigate operations in a huge function library and prevent execution errors in high-risk critical environments such as network security.
Code agent development : The model can load the entire code base into the context at once to achieve end-to-end code generation, vulnerability repair and automated debugging.
financial analysis processing : Thousands of pages of reports can be loaded directly into memory, eliminating the trouble of repeatedly re-reasoning during lengthy conversations and greatly improving work efficiency.

Technical principles of Nemotron 3 Super

Mamba-MoE hybrid architecture: The model adopts an 88-layer network structure, with the Mamba-2 layer and the Transformer attention layer periodically alternately arranged. The Mamba-2 layer provides sequence modeling efficiency with linear time complexity, and a small number of Transformer layers serve as global anchor points responsible for cross-location long-distance information routing and high-precision reasoning, significantly improving reasoning throughput while maintaining strong modeling capabilities.
LatentMoE implicit hybrid expert architecture : NVIDIA’s first new MoE design projects tokens from hidden dimensions to smaller potential dimensions before routing and expert calculations. Routing and expert calculations are performed in this compressed space, which directly reduces the parameter loading and communication volume several times. The saved resources are used to increase the total number of experts and the number of activated experts, achieving the effect of “activating 4 experts at the cost of 1 expert” and improving the model accuracy at almost the same inference cost.
Multi-Token prediction acceleration : The model simultaneously predicts multiple future tokens at each position, which not only forces the model to learn multi-step causality and long-term text structure to improve quality, but more importantly, it realizes native speculative decoding - the auxiliary prediction head is used as a built-in draft model to quickly generate candidate sequences, and the main model completes the verification in one forward propagation, which greatly reduces the generation delay and has minimal additional overhead.
NVFP4 low-precision pre-training : The whole process is pre-trained with NVFP4 precision on the Blackwell platform, and the 4-bit floating point format greatly reduces video memory requirements. Under the premise of zero accuracy loss, the inference speed is 4 times faster than the Hopper architecture FP8, proving the feasibility and efficiency of large-scale low-precision training.

Project address of Nemotron 3 Super

Project official website ：https://blogs.nvidia.com/blog/nemotron-3-super-agentic-ai/
HuggingFace model library ：https://huggingface.co/collections/nvidia/nvidia-nemotron-v3
technical paper : https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf

Application scenarios of Nemotron 3 Super

Intelligent platform core engine : As the “strongest open source model” of agent platforms such as OpenClaw, it drives multi-agent collaboration to complete complex long-term tasks and solves the two major bottlenecks of context explosion and thought tax.
Enterprise level software development : Empowering software development agents from CodeRabbit, Factory, Greptile and other companies to achieve code base-level end-to-end generation, debugging and vulnerability repair, with a SWE-Bench test of 60.47%.
In-depth research and analysis : Drives the NVIDIA AI-Q research agent, winning the DeepResearch Bench ranking, and supports multi-step reasoning and information integration across massive documents.
Network security operation and maintenance : In high-risk environments such as autonomous security orchestration, high-precision tools can be used to reliably navigate a huge function library to prevent critical execution errors.
Financial analysis : Load thousands of pages of financial reports into the memory at one time and conduct in-depth analysis directly without repeated re-reasoning, which greatly improves investment research efficiency. ©

← Previous LTX-2.3 - Lightricks' latest open-source video generation model Next → InternVL-U - An open-source multimodal integrated model from Shanghai AI Lab and other sources

Qwen-Image-2.0 is a new generation image generation model launched by Alibaba's Tongyi Qianwen, supporting two core capabilities: accurate text rendering and realistic texture detail. The model supports 1k token long commands to directly output professional infographics, PPTs, and posters, and natively renders details of people, nature, and architecture at 2K resolution.

Elon Musk's 3-hour conversation was full of bombshell revelations! Robots will become a "perpetual money-making machine

On February 6th reported that Elon Musk's latest nearly 3-hour interview was released on YouTube early this morning. He revealed several key figures: SpaceX is preparing for 10,000 to 20,000-30,000 launches per year, and its space computing power will exceed the global total in 5 years; Tesla's AI5 chip will be taped out and mass-produced in the second quarter of next year, with the AI6 chip launching less than a year later; Optimus will have a production capacity of one million units in 3 years and ten million units in 4 years. ...

Intern-S1-Pro - An open-source scientific multimodal large model from Shanghai AI Lab

Intern-S1-Pro is a trillion-parameter scientific multimodal model open-sourced by the Shanghai AI Lab. It employs the MoE architecture (1T total parameters, 22B activations) and is built upon the "general-specific-general" SAGE technology. The model is endowed with "physical intuition" through Fourier positional encoding and reconstructed temporal encoders, enabling a unified understanding of everything from microscopic life signals to macroscopic cosmic fluctuations. It excels in Olympiad-level mathematical reasoning, the five major scientific disciplines (chemistry, materials science, life sciences, earth sciences, and physics), and real-world research scenarios. It is the world's largest open-source scientific multimodal model in terms of parameter size, propelling AI4S from a "tool revolution" to a "scientific revolution"...

Finally, Apple supports the Claude Agent SDK!

Apple and Anthropic jointly announced early this morning that Xcode, the official programming tool for Apple platform developers, has released version 26.3, and for the first time natively integrates Claude Agent, supporting development in Agentic Coding mode. In addition to Claude Agent, Xcode 26.3 also supports integration with OpenAI's Codex code agent. ...