Intern-S1-Pro - An open-source scientific multimodal large model from Shanghai AI Lab

Intern-S1-Pro is a trillion-parameter scientific multimodal model open-sourced by the Shanghai AI Lab. It employs the MoE architecture (1T total parameters, 22B activations) and is built upon the "general-specific-general" SAGE technology. The model is endowed with "physical intuition" through Fourier positional encoding and reconstructed temporal encoders, enabling a unified understanding of everything from microscopic life signals to macroscopic cosmic fluctuations. It excels in Olympiad-level mathematical reasoning, the five major scientific disciplines (chemistry, materials science, life sciences, earth sciences, and physics), and real-world research scenarios. It is the world's largest open-source scientific multimodal model in terms of parameter size, propelling AI4S from a "tool revolution" to a "scientific revolution"...

Intern-S1-Pro - An open-source scientific multimodal large model from Shanghai AI Lab

Intern-S1-Pro is a trillion-parameter scientific multi-modal large model open sourced by Shanghai AI Laboratory. It adopts MoE architecture (1T total parameters, 22B activation) and is built based on the “universal-specialist integration” SAGE technology. The model uses Fourier position coding and reconstructed timing encoders to give the model “physical intuition” and unify the understanding of microscopic life signals to macrocosmic fluctuations. It performs well in Olympiad-level mathematical reasoning, five major scientific disciplines (chemistry, materials, life, earth, physics) and real scientific research scenarios. It is the scientific multi-modal model with the largest parameter scale in the global open source community, promoting AI4S from “tool revolution” to a new paradigm of “scientific discovery”.

Main functions of Intern-S1-Pro

  • scientific reasoning : The model has Olympiad gold medal-level mathematical logical reasoning capabilities and has performed well in the International Mathematical Olympiad and International Physics Olympiad evaluations.
  • multimodal understanding : The model can accurately analyze complex scientific visual content such as molecular structure diagrams, experimental charts, and remote sensing images.
  • Timing signal analysis : Unified processing of heterogeneous time series data ranging from several to millions of samples, covering astronomy, geography, physiological signals, bioacoustics and other fields.
  • interdisciplinary research : Construct a full spectrum capability matrix across the five core disciplines of chemistry, materials, life, earth, and physics, supporting more than 100 professional subtasks such as chemical retrosynthesis and protein sequence generation.
  • Agent capabilities : Support the leap from static task planning to dynamic environment interaction, and demonstrate world-class independent planning and execution capabilities in complex scientific research processes.
  • general ability : It ranks among the first echelons of open source models in terms of cross-modal understanding of images and text, high-quality text generation, complex instruction following, and tool invocation.

Technical principles of Intern-S1-Pro

  • SAGE “General-Specialization Integration” Architecture : By sharing the design of the basic representation layer and the differentiated expert layer, the models can enhance each other during the training process, maintain extensive general cognitive capabilities and deeply specialized scientific reasoning capabilities, and achieve the goal of “a universal model that can be deeply specialized”.
  • Hybrid Expert Architecture (MoE) : Intern-S1-Pro adopts a MoE architecture with a total of 1 trillion parameters and 512 experts. Only 8 experts are activated per forward propagation (about 22 billion activation parameters). The innovative routing density estimation mechanism improves training stability and avoids the common expert collapse problem in traditional MoE. At the same time, it introduces a group routing strategy to achieve load balancing of massive computing chips and efficiently schedule computing resources like an intelligent transportation system.
  • Physical perception layer innovation : The research team introduced Fourier Positional Encoding (FoPE) to give the model unique “physical intuition” - it can capture the relative distance between text tokens like observing particles, and grasp the overall frequency pattern of scientific signals like analyzing fluctuations; at the same time, it reconstructs the adaptive timing encoder so that it automatically adjusts according to the data density, realizing for the first time the unified modeling of heterogeneous timing signals spanning up to six orders of magnitude sampling scale.
  • Deep adaptation of domestic computing power : The model has established a joint research and development route with the Ascend computing ecosystem from the beginning of the architecture design to achieve full-stack deep adaptation from the lowest-level operator optimization and compiler adaptation to the upper-level training framework XTuner V1 and inference engine LMDeploy, overcome core technical problems such as accuracy alignment in large-scale training and stability of ultra-long sequence reinforcement learning, and build an autonomous and controllable “computing power-algorithm” integrated base.

Intern-S1-Pro project address

Application scenarios of Intern-S1-Pro

  • basic scientific research : Intern-S1-Pro can assist theoretical research in mathematical physics, chemical material design and synthesis path planning, protein prediction and drug development in life sciences.
  • Earth and Environmental Sciences : The model supports environmental scientific research such as remote sensing image analysis, climate monitoring, geological exploration, and disaster risk prediction.
  • Engineering and Technology Development : The model can interpret engineering drawings, analyze experimental data, generate technical documents, and link with external software to realize automated research and development processes.
  • Scientific research agent collaboration : The model can build an autonomous agent to perform literature retrieval, experimental design, result analysis and iterative optimization, forming a closed-loop scientific research process.
  • Science education and popularization : Provide students and researchers with personalized academic tutoring, problem-solving guidance and research method training to lower the threshold for scientific learning. ©