Mistral Small 4 - Mistral AI's open-source multimodal large model

Mistral Small 4 is an open-source multimodal large model from Mistral AI. It is the first model to unify reasoning (Magistral), multimodal (Pixtral), and agent encoding (Devstral) capabilities into a single architecture. It supports text and image input and can flexibly switch between fast response and deep reasoning modes through the reasoning_effort parameter.

Mistral Small 4 - Mistral AI's open-source multimodal large model

Mistral Small 4 is Mistral AI’s open source multi-modal large model. For the first time, the model unifies reasoning (Magistral), multi-modal (Pixtral) and agent coding (Devstral) capabilities into a single architecture. It supports text and image input and can be accessed through reasoning_effort Parameters can flexibly switch between fast response and deep reasoning modes. The model is optimized for enterprise-level efficiency, with latency reduced by 40% and throughput increased by 3 times. It has been launched on Mistral API, Hugging Face and NVIDIA NIM platforms.

Key features of Mistral Small 4

  • Unified multi-capability architecture : For the first time, chat instructions (Instruct), deep reasoning (Reasoning) and multimodal understanding (Multimodal) are integrated into a single model, eliminating the need to switch between different models.
  • Adjustable reasoning strength :Pass reasoning_effort Flexible control of parameters: none: Quick response, suitable for daily conversations.
  • high: Deep step-by-step reasoning, suitable for complex problems. Native multimodal processing : Supports both text and image input, enabling tasks such as document parsing, visual analysis, and image and text understanding. Agent coding ability : Supports development scenarios such as code generation, code base exploration, and automated programming workflow. Long context handling :Supports 256K contextual windows, long document analysis and long conversations Enterprise-level efficiency : Compared with the previous generation, the latency is reduced by 40%, the throughput is increased by 3 times, and supports efficient deployment.

Key information and usage requirements for Mistral Small 4

  • Architecture :Mixture of Experts (MoE)
  • Number of experts : 128 experts, 4 activated per token
  • total parameters :119 billion (119B)
  • Activation parameters : 6 billion/token (including embedding layer 8 billion)
  • context window :256K tokens
  • Open source agreement :Apache 2.0
  • Hardware requirements Minimum configuration : 4× NVIDIA HGX H100 or 2× HGX H200 or 1× DGX B200
  • Recommended configuration : 4× NVIDIA HGX H100 or 4× HGX H200 or 2× DGX B200

Mistral Small 4 Core Advantages

  • integrated integration : For the first time, the three major capabilities of reasoning, multi-modality, and agent programming are unified into one model, eliminating the need to switch between multiple models.
  • flexible reasoning : Freely switch between fast response and deep thinking modes through the reasoning_effort parameter, and allocate computing power on demand.
  • Extreme efficiency : The output length is significantly shorter under the same performance, directly reducing inference costs and improving user experience.
  • truly open source : The Apache 2.0 protocol supports commercial use and in-depth customization, and works with NVIDIA NeMo to achieve domain fine-tuning.
  • Ecological binding : As a founding member of the NVIDIA Nemotron Alliance, get full-stack optimization support from hardware to deployment tools.
  • enterprise value : Lower token cost and more stable quality make large-scale AI deployment more economically feasible.
  • technical value : High “performance per token” simplifies model selection and reduces fine-tuning iterations and backup system dependencies.

How to use Mistral Small 4

  • Via Mistral official platform : Called directly in Mistral API or AI Studio, no need to build your own infrastructure, suitable for quick start and prototype verification.
  • By Hugging Face : Download model weights from the Hugging Face repository, and use open source frameworks such as Transformers, vLLM, llama.cpp, SGLang, etc. for local deployment and inference.
  • Via NVIDIA platform : Test model performance for free at build.nvidia.com, or deploy production-grade containerization with NVIDIA NIM to get optimized inference performance out of the box.
  • Customize with fine-tuning : Use the NVIDIA NeMo framework to perform domain-specific fine-tuning of the model to create a customized version that meets specific business needs.
  • Configure inference strength : Control the behavior with the reasoning_effort parameter when calling: set to “none” for fast response, set to “high” to activate deep reasoning mode.
  • Hardware requirements : Local deployment requires at least 4x HGX H100 or 1x DGX B200 level computing power. It is recommended to double the configuration to ensure optimal performance.

Mistral Small 4 project address

Comparison of similar competing products of Mistral Small 4

model Open source agreement Parameter quantity context Core advantages Disadvantages     Mistral Small 4 Apache 2.0 119B/6B activation 256K Three-in-one unified, adjustable reasoning, high efficiency Deployment hardware requirements are high   Llama 3.1/3.2 Partially restricted 8B-405B 128K Mature ecology and strong community support Reasoning and multimodality need separate models   Qwen 2.5 Apache 2.0 0.5B-72B 128K Good Chinese optimization, many size choices Long text is slightly less efficient   DeepSeek-V3 MIT 671B/37B activation 64K Strong mathematical reasoning and low cost Limited multimodal support   Gemma 3 Apache 2.0 1B-27B 128K Google ecosystem, lightweight deployment Overall ability is not as good as Small 4    

Application scenarios of Mistral Small 4

  • Smart programming : The model can automatically generate code, fix bugs and understand the architecture of large code bases, improving development efficiency.
  • Enterprise customer service : Handle daily consultations and complex complaints through adjustable reasoning mode, reducing manual intervention costs.
  • Document analysis : The model can parse long documents, contracts and cross-file related information, and supports 256K context depth processing.
  • visual understanding : Supports the identification of invoices, charts and picture contents, and realizes intelligent information extraction through the combination of pictures and text.
  • Scientific research assistance : The model can complete mathematical derivation, paper interpretation and experimental design, and provide academic support for step-by-step reasoning. ©