GPT-5.3-Codex-Spark - A lightweight programming model and AI toolset from OpenAI

GPT-5.3-Codex-Spark is OpenAI’s first dedicated real-time ProgrammingA lightweight model designed for ultimate speed. The model runs on the Cerebras WSE-3 wafer-level chip, with an inference speed of over 1000 tokens/second and supports 128k context. Unlike Codex, which is good at long-term autonomous tasks, GPT-5.3-Codex-Spark specializes in real-time collaboration scenarios. It can interrupt and correct while outputting, making coding interaction more “handy”. OpenAI reconstructed the underlying reasoning stack to reduce latency by 80%. Codex-Spark has been rolled out to ChatGPT Pro users as a research preview, available in the latest version of Codex Used in apps, CLI, and VS Code extensions.

Key features of GPT‑5.3‑Codex‑Spark

Live coding collaboration : The model supports developers to interrupt, correct or redirect while observing the model output, achieving a “follow-the-hand” real-time interactive experience.
Ultra-high-speed reasoning : Supports running on Cerebras WSE-3 wafer-level chip, with inference speed exceeding 1000 tokens/second, and is optimized for ultra-low latency scenarios.
Precise code editing : The model adopts a lightweight working style by default, making only minimal and targeted code modifications to quickly adjust logic, interfaces or structures.
Low latency architecture optimization : By introducing persistent WebSocket connections, rewriting the inference stack and optimizing the Responses API, the client/server round trip overhead is reduced by 80%, the cost per token is reduced by 30%, and the first token time is shortened by 50%.
Large context processing : Supports 128k context windows, enabling real-time analysis and modification of large code bases.
Dual mode collaboration : As OpenAI’s first real-time coding model, it will be integrated with the Codex standard version of long-term reasoning in the future to achieve parallelization of real-time interaction and background time-consuming tasks, and automatically balance interaction speed and task breadth.
Multi-platform access : Integrated into Codex applications, CLI command line tools and VS Code extensions to facilitate developers to use it in different scenarios.

Technical principles of GPT‑5.3‑Codex‑Spark

Dedicated AI accelerator architecture : Supports running on the Cerebras Wafer Scale Engine 3 (WSE-3) wafer-level engine. It is an AI accelerator designed for high-throughput, low-latency inference. It achieves ultimate parallel computing capabilities through full wafer-level integration.
Model lightweight design :as GPT-5.3-CodexThe distilled version adopts a smaller parameter scale, which greatly reduces the computational load while maintaining the core coding capabilities, achieving a balance between speed and performance.
End-to-end latency optimization : Reconstruct the complete request-response link and introduce persistent WebSocket connections to replace traditional HTTP polling to reduce connection establishment overhead; rewrite key reasoning stack components to optimize token generation and transmission efficiency; improve the session initialization mechanism and shorten the first token waiting time.
Streaming response mechanism : Optimize the response streaming from the server to the client, so that the token can be pushed in real time, and it can be used with incremental rendering to achieve instant visual feedback.
Targeted fine-tuning strategy : Specialized training for real-time interaction scenarios, strengthening the processing efficiency of short-cycle tasks such as partial code editing and rapid logic adjustment, and weakening the tendency of long-chain autonomous execution.

GPT‑5.3‑Codex‑Spark project address

Project official website ：https://openai.com/index/introducing-gpt-5-3-codex-spark/

Application scenarios of GPT‑5.3‑Codex‑Spark

On-the-fly code debugging : After discovering a bug, developers can immediately call Spark to quickly locate and fix it. There is no need to wait for the model to think for a long time, and the modification effect can be verified while interacting.
Rapid interface iteration : In UI/UX development, styles, layouts or interaction logic can be frequently adjusted to shorten the design-feedback closed loop.
Code review and optimization : The model can review existing code line by line. Users can immediately obtain improvement suggestions and apply targeted refactorings, maintaining full control of the modification process.
**Learn to explore programming : When programming beginners or researching new libraries, they can explore API usage and understand code logic through real-time conversations, and the model responds immediately to reduce cognitive interruptions.
Rapid prototype verification** : Quickly build MVP in the early stages of the product. Users can describe the requirements while watching code generation, accelerating the transformation from concept to runnable code. ©

← Previous M2.5 - MiniMax's flagship programming model Next → Xiaomi-Robotics-0 - Xiaomi's open-source VLA robot model

PaperBanana is an automated academic illustration generation framework jointly developed by Peking University and Google Cloud AI Research, addressing the pain point of time-consuming and labor-intensive data creation for AI researchers in academic papers. The system employs an innovative multi-agent collaborative architecture, comprising five specialized agents: Retriever, Planner, Stylist, Visualizer, and Critic.

HiClaw - Alibaba Cloud's open-source multi-agent team collaboration system

HiClaw is an open-source agent-based team collaboration system from Alibaba, positioned as a "Team version of OpenClaw." The system introduces a Manager Agent as an AI steward, automatically coordinating multiple Worker Agents to complete complex tasks. HiClaw's core highlights include: Workers do not hold real credentials (only Consumer Tokens), ensuring secure isolation and preventing leaks; a built-in Matrix server allows for real-time monitoring and intervention via mobile phone; and conversational creation...

Fun-CineForge - Alibaba Tongyi's open-source film-grade multimodal dubbing model

Fun-CineForge is the first film-grade multimodal dubbing model open-sourced by Tongyi Lab. Built on CosyVoice3, it innovatively introduces "temporal modality" to achieve precise audio-visual synchronization. The model supports monologues, narration, dialogues, and multi-person scenes, solving four major challenges: lip-syncing, emotional expression, consistent timbre, and time alignment. Fun-CineForge comes with an open-source CineDub dataset construction workflow, covering over 350 films and TV series, with a Chinese character error rate as low as 1.49%. It maintains high-quality dubbing even in complex scenes such as facial occlusion and camera transitions. ...

Nemotron 3 Super - NVIDIA's open-source large model for agent inference

Nemotron 3 Super is an open-source AI model from NVIDIA with 120 billion parameters. It employs a Mamba-MoE hybrid architecture and is optimized for agent applications. The model supports ultra-long contexts with up to 1 million tokens, offering a 3x speedup inference and a 5x increase in throughput. It demonstrates excellent success rate on the OpenClaw task, with performance approaching that of Claude Opus 4.6. NVIDIA has also open-sourced over 10 trillion tokens of training data, a complete methodology, and 15 reinforcement learning environments, making it an ideal choice for enterprise-grade multi-agent systems. Nemotron 3...