LongCat-Flash-Prover - Meituan's open-source mathematical theorem proof model

LongCat-Flash-Prover is a 560 billion parameter MoE model open-sourced by Meituan, focusing on Lean4 formal mathematical reasoning. The model integrates reasoning through Agentic tools (TIR), breaking down the task into three main capabilities: automatic formalization, sketch generation, and theorem proof.

LongCat-Flash-Prover - Meituan's open-source mathematical theorem proof model

LongCat-Flash-Prover is Meituan’s open source 560 billion parameter MoE model, focusing on Lean4 formal mathematical reasoning. The model uses Agentic Tool Integrated Reasoning (TIR) ​​to decompose the task into three major capabilities: automatic formalization, sketch generation, and theorem proving. LongCat-Flash-Prover uses a hybrid expert iteration framework and HisPO reinforcement learning algorithm for stable training, and introduces an anti-cheating mechanism to ensure the rigor of reasoning. It has reached the SOTA level in benchmark tests such as MiniF2F-Test, with a Pass@32 accuracy rate of 93.9% and a PutnamBench problem solving rate of 28.9%, significantly surpassing existing open source models.

Main functions of LongCat-Flash-Prover

  • automatic formalization : Supports the conversion of natural language mathematics problems into verified Lean4 formal statements.
  • sketch generation : Generate a lemma-style proof framework based on questions and formal statements.
  • Theorem Proof : Supports generating complete proofs or introducing auxiliary lemmas to complete the proof of the target theorem.
  • Tool Integrated Reasoning : The model can directly call the Lean4 compiler for real-time verification and feedback iteration.

The technical principle of LongCat-Flash-Prover

  • Hybrid expert iteration framework : The framework supports the deployment of multiple specially optimized expert models, which are responsible for tasks in different fields such as automatic formalization, sketch generation and proof. By allowing expert models to generate reasoning trajectories and perform iterative optimization with the assistance of tools, it simulates the learning process of human trial and error, verification and reflection, and expands high-quality cold start data.
  • Hierarchical importance sampling strategy optimization (HisPO) : In response to the instability of the MoE model in long-range task training, HisPO adopts a hierarchical pruning strategy. By estimating the importance sampling ratio at the sequence level and token level, it eliminates the gradient contribution that is significantly different between the training and inference engines, and stabilizes the reinforcement learning training process.
  • Anti-reward cheating mechanism : The system introduces theorem consistency detection and legality detection to identify and filter proofs that are inconsistent with the semantics of formal statements, do not match conditions, or contain unverified axioms, and prevent models from obtaining false rewards by deceiving the Lean4 server.

Key information and usage requirements of LongCat-Flash-Prover

  • Model size : Adopting 560 billion parameter MoE architecture, one of the largest number of parameters among open source weight models
  • core positioning : Focus on Lean4 native formal reasoning, without modifying the model architecture for formal tasks.
  • Performance breakthrough : MiniF2F-Test reaches 93.9% (Pass@32), PutnamBench reaches 28.9%, both are open source models SOTA
  • Reasoning efficiency : On MiniF2F-Test, it can reach 97.1% pass rate with only 72 inferences, and the sample efficiency is extremely high.
  • training data : Synthesize high-quality trajectories through a hybrid expert iteration framework, supporting three types of tasks: automatic formalization, sketching, and proof.
  • Hardware environment : The 560B parameter MoE model requires a large-scale GPU cluster to support inference. A multi-card environment with sufficient video memory is recommended.
  • Software dependencies : Lean4 proof assistant and corresponding tool chain need to be installed, and the model can be verified and interacted in real time through the Lean4 server.
  • Deployment method : Supports Whole-Proof mode (directly generates complete proof) and Sketch-Proof mode (first sketches and then completes), the latter works better with TIR

The core advantages of LongCat-Flash-Prover

  • Native ability : Formal reasoning is regarded as a native capability of LLM. The Lean4 tool chain can be directly called without special architecture modifications to achieve deep integration with the formal environment.
  • SOTA performance : It leads the open source model in all five major benchmarks: MathOlympiad-Bench, MiniF2F-Test, ProofNet, ProverBench, and PutnamBench, and some indicators approach or surpass the closed-source commercial model.
  • Sample efficient : It only takes 72 inferences to achieve a 97.1% pass rate in MiniF2F-Test, which is far lower than the number of attempts required by similar models, and the inference cost is significantly reduced.
  • Anti-cheating design : Through theorem consistency detection and legality detection mechanisms, ensure that the model output is authentic and credible, and avoid false proofs caused by reward cheating.

How to use LongCat-Flash-Prover

  • Environmental preparation : Install the Lean4 proof assistant and dependent tool chain, configure the GPU environment required for model inference, and ensure that the video memory is sufficient to support the loading and running of the 560B parameter MoE model.
  • Get model : Download the model weights from the HuggingFace repository, or directly use the inference interface and sample code provided by GitHub to deploy.
  • Select inference mode : Select the Whole-Proof mode to directly generate a complete proof based on the task complexity, or select the Sketch-Proof mode to output the lemma framework first and then gradually complete it.
  • Enter question : Input natural language mathematical problems or theorems to be proved into the model, and the model automatically calls the Lean4 compiler for real-time verification, and iteratively optimizes the proof process based on feedback.
  • Get results : The model outputs formal proofs verified by Lean4, which can be directly used in mathematical formal verification, theorem library construction or academic research.

Project address of LongCat-Flash-Prover

Comparison of similar competing products of LongCat-Flash-Prover

modelscaleMathOlympiad-BenchMiniF2F-TestPutnamBenchCore differences
LongCat-Flash-Prover560BMoE35.8%93.9%28.9%Native TIR tool integration, sketch + proof dual mode
DeepSeek-Prover-V2-671B671B13.9%82.4%3.3%Previously, open source SOTA had no sketch generation mechanism.
Kimina-Prover-72B72B13.1%84.0%3.9%Early open source solutions have low reasoning efficiency

Application scenarios of LongCat-Flash-Prover

  • academic mathematics research : Assists mathematicians to convert natural language conjectures into Lean4 formal statements and automatically verify them, accelerating the proof discovery process. It is especially suitable for areas such as algebraic geometry and number theory that require strict logical derivation.
  • Mathematics competition training : Provide problem-solving idea verification and formal proof generation for high-level mathematics competitions such as IMO and Putnam, helping players understand the rigorous proof structure of complex problems.
  • Formal verification engineering : In scenarios such as software correctness proof, cryptographic protocol verification, hardware design verification, etc., automatically generate or assist in the construction of formal proofs to improve the security of key systems.
  • educational aids : As an intelligent mathematics teaching assistant, it provides students with step-by-step guidance from problem understanding to complete proof, detects reasoning loopholes in real time and gives correction suggestions. ©