LingBot-World - An open-source interactive world model from Ant Lingbo Technology

LingBot-World is an open-source interactive world model from AntLingbo Technology. The model learns physical laws and causal relationships from large-scale game environments through a scalable data engine, achieving accurate action-driven generation. The model supports nearly 10 minutes of continuous and stable generation, with a response speed of 16 FPS and latency controlled within 1 second, while also possessing zero-shot scene generalization capabilities. The model effectively solves the pain points of scarce and costly real-world training data, and can be widely used in robot training, autonomous driving simulation, and game development, allowing intelligent agents to learn safely and efficiently through trial and error in virtual environments.

LingBot-World - An open-source interactive world model from Ant Lingbo Technology

LingBot-World is an open source interactive world model developed by Ant Lingbo Technology. The model learns physical laws and causal relationships from large-scale game environments through a scalable data engine to achieve precise action-driven generation. The model supports continuous and stable generation for nearly 10 minutes, with a response speed of 16 FPS and a delay controlled within 1 second. It also has Zero-shot scene generalization capabilities. The model effectively solves the pain points of scarcity and high cost of real-world training data. It can be widely used in robot training, autonomous driving simulation and game development, allowing agents to learn by trial and error safely and efficiently in a virtual environment.

Main functions of LingBot-World

  • High-fidelity interactive generation : Supports action-driven refined generation, accurately responds to user instructions, and renders dynamic scenes that are physically realistic.
  • long term consistency : The model can achieve continuous and stable generation for nearly 10 minutes, maintain object persistence and scene structure integrity, and solve the problem of “long-term drift”.
  • Real-time closed loop control : The model can achieve 16 FPS generation throughput, end-to-end latency is less than 1 second, and supports real-time keyboard and mouse control of characters and perspectives.
  • world event trigger : Through text commands, environmental changes such as weather and style can be dynamically adjusted to maintain consistent geometric relationships.
  • Zero-shot generalization : Input a single image to generate an interactive video stream without training for specific scenarios.

LingBot-World’s technical principles

  • Extensible data engine : Integrate online video cleaning and Unreal Engine synthesis pipelines to extract pure images without UI interference from the rendering layer, simultaneously record operating instructions and camera poses, and provide accurately aligned training signals for the model to learn “how actions change the environment.”
  • Multi-stage training strategy : The model enhances contextual memory capabilities through phased optimization and parallelization acceleration, achieving continuous and stable generation for nearly 10 minutes, maintaining object permanence and scene structure integrity.
  • causal distillation : Compress physical laws and causal logic into the model, ensuring that the model deeply understands the causal relationship between actions and results while maintaining 16 FPS real-time reasoning performance.

LingBot-World project address

Application scenarios of LingBot-World

  • Embodied intelligence training : Provide a low-cost, high-fidelity virtual “drill field” for robots, support trial-and-error learning of complex long-range tasks, and solve the pain points of high cost and high risk of real-world data collection.
  • Autonomous driving simulation : The model can improve model generalization capabilities through dynamic changes in lighting, weather, etc., and reduce actual vehicle testing costs and safety risks.
  • game development : As a playable real-time simulator, it supports developers to quickly generate interactive content and realize dynamic world events and stylized rendering.
  • VR/AR simulation : Provide a low-latency, high-fidelity immersive environment for virtual training, digital twins and human-computer interaction research. ©