Intern-S1-Pro - An open-source scientific multimodal large model from Shanghai AI Lab

Intern-S1-Pro is a trillion-parameter scientific multi-modal large model open sourced by Shanghai AI Laboratory. It adopts MoE architecture (1T total parameters, 22B activation) and is built based on the “universal-specialist integration” SAGE technology. The model uses Fourier position coding and reconstructed timing encoders to give the model “physical intuition” and unify the understanding of microscopic life signals to macrocosmic fluctuations. It performs well in Olympiad-level mathematical reasoning, five major scientific disciplines (chemistry, materials, life, earth, physics) and real scientific research scenarios. It is the scientific multi-modal model with the largest parameter scale in the global open source community, promoting AI4S from “tool revolution” to a new paradigm of “scientific discovery”.

Main functions of Intern-S1-Pro

scientific reasoning : The model has Olympiad gold medal-level mathematical logical reasoning capabilities and has performed well in the International Mathematical Olympiad and International Physics Olympiad evaluations.
multimodal understanding : The model can accurately analyze complex scientific visual content such as molecular structure diagrams, experimental charts, and remote sensing images.
Timing signal analysis : Unified processing of heterogeneous time series data ranging from several to millions of samples, covering astronomy, geography, physiological signals, bioacoustics and other fields.
interdisciplinary research : Construct a full spectrum capability matrix across the five core disciplines of chemistry, materials, life, earth, and physics, supporting more than 100 professional subtasks such as chemical retrosynthesis and protein sequence generation.
Agent capabilities : Support the leap from static task planning to dynamic environment interaction, and demonstrate world-class independent planning and execution capabilities in complex scientific research processes.
general ability : It ranks among the first echelons of open source models in terms of cross-modal understanding of images and text, high-quality text generation, complex instruction following, and tool invocation.

Technical principles of Intern-S1-Pro

SAGE “General-Specialization Integration” Architecture : By sharing the design of the basic representation layer and the differentiated expert layer, the models can enhance each other during the training process, maintain extensive general cognitive capabilities and deeply specialized scientific reasoning capabilities, and achieve the goal of “a universal model that can be deeply specialized”.
Hybrid Expert Architecture (MoE) : Intern-S1-Pro adopts a MoE architecture with a total of 1 trillion parameters and 512 experts. Only 8 experts are activated per forward propagation (about 22 billion activation parameters). The innovative routing density estimation mechanism improves training stability and avoids the common expert collapse problem in traditional MoE. At the same time, it introduces a group routing strategy to achieve load balancing of massive computing chips and efficiently schedule computing resources like an intelligent transportation system.
Physical perception layer innovation : The research team introduced Fourier Positional Encoding (FoPE) to give the model unique “physical intuition” - it can capture the relative distance between text tokens like observing particles, and grasp the overall frequency pattern of scientific signals like analyzing fluctuations; at the same time, it reconstructs the adaptive timing encoder so that it automatically adjusts according to the data density, realizing for the first time the unified modeling of heterogeneous timing signals spanning up to six orders of magnitude sampling scale.
Deep adaptation of domestic computing power : The model has established a joint research and development route with the Ascend computing ecosystem from the beginning of the architecture design to achieve full-stack deep adaptation from the lowest-level operator optimization and compiler adaptation to the upper-level training framework XTuner V1 and inference engine LMDeploy, overcome core technical problems such as accuracy alignment in large-scale training and stability of ultra-long sequence reinforcement learning, and build an autonomous and controllable “computing power-algorithm” integrated base.

Intern-S1-Pro project address

Project official website ：https://chat.intern-ai.org.cn/
GitHub repository ：https://github.com/InternLM/Intern-S1
HuggingFace model library ：https://huggingface.co/internlm/Intern-S1-Pro

Application scenarios of Intern-S1-Pro

basic scientific research : Intern-S1-Pro can assist theoretical research in mathematical physics, chemical material design and synthesis path planning, protein prediction and drug development in life sciences.
Earth and Environmental Sciences : The model supports environmental scientific research such as remote sensing image analysis, climate monitoring, geological exploration, and disaster risk prediction.
Engineering and Technology Development : The model can interpret engineering drawings, analyze experimental data, generate technical documents, and link with external software to realize automated research and development processes.
Scientific research agent collaboration : The model can build an autonomous agent to perform literature retrieval, experimental design, result analysis and iterative optimization, forming a closed-loop scientific research process.
Science education and popularization : Provide students and researchers with personalized academic tutoring, problem-solving guidance and research method training to lower the threshold for scientific learning. ©

← Previous Keling 3.0 Model - Kuaishou Keling's next-generation multimodal AI creation model Next → Voxtral Transcribe 2 - A series of speech-to-text models launched by Mistral AI

Nemotron 3 Super is an open-source AI model from NVIDIA with 120 billion parameters. It employs a Mamba-MoE hybrid architecture and is optimized for agent applications. The model supports ultra-long contexts with up to 1 million tokens, offering a 3x speedup inference and a 5x increase in throughput. It demonstrates excellent success rate on the OpenClaw task, with performance approaching that of Claude Opus 4.6. NVIDIA has also open-sourced over 10 trillion tokens of training data, a complete methodology, and 15 reinforcement learning environments, making it an ideal choice for enterprise-grade multi-agent systems. Nemotron 3...

GPT-5.4 nano - A lightweight, fast AI model from OpenAI

GPT-5.4 nano is the lightest and fastest version of GPT-5.4 released by OpenAI, designed for simple, high-throughput tasks with extremely high speed and cost requirements. The model performs exceptionally well in classification, data extraction, ranking, and lightweight sub-agent tasks, with an input cost of only $0.20/million tokens and an output cost of $1.25/million tokens, approximately 1/12th the cost of GPT-5.4. Currently, it is only available through an API. The main features of GPT-5.4 nano...

Mistral Small 4 - Mistral AI's open-source multimodal large model

Mistral Small 4 is an open-source multimodal large model from Mistral AI. It is the first model to unify reasoning (Magistral), multimodal (Pixtral), and agent encoding (Devstral) capabilities into a single architecture. It supports text and image input and can flexibly switch between fast response and deep reasoning modes through the reasoning_effort parameter.

GPT-5.3-Codex-Spark - A lightweight programming model and AI toolset from OpenAI

GPT-5.3-Codex-Spark is OpenAI's first lightweight model designed specifically for real-time programming, emphasizing extreme speed. The model runs on a Cerebras WSE-3 wafer-level chip, achieving inference speeds exceeding 1000 tokens/second and supporting 128k context.