PaperBanana - An AI-powered framework for automatically generating academic illustrations, jointly developed by Peking University and Google
PaperBanana is an automated academic illustration generation framework jointly developed by Peking University and Google Cloud AI Research, addressing the pain point of time-consuming and labor-intensive data creation for AI researchers in academic papers. The system employs an innovative multi-agent collaborative architecture, comprising five specialized agents: Retriever, Planner, Stylist, Visualizer, and Critic.
PaperBanana is an automated generation framework for academic illustrations jointly launched by Peking University and Google Cloud AI Research. It solves the time-consuming and labor-intensive pain point of AI researchers drawing paper diagrams. The system adopts an innovative multi-agent collaboration architecture, including five professional agents: Retriever, Planner, Stylist, Visualizer, and Critic. Through a two-stage process of linear planning and iterative optimization, it automatically generates methodology diagrams and statistical charts that comply with publication standards.
Main functions of PaperBanana
- Methodology diagram generation : Enter the paper description and automatically generate publication-level algorithm architecture diagrams and flow charts.
- Statistical chart generation : Supports two statistical chart production methods: code generation to ensure accuracy or image generation to ensure beauty.
- Aesthetic style optimization : Upgrade rough sketches into a modern academic visual style that meets top conference standards.
- Multi-agent collaboration : Five professional AI agents divide the labor to complete the entire process of retrieval, planning, design, drawing, and optimization.
- Automatic quality assessment : Built-in AI reviewer automatically checks and iteratively optimizes the accuracy and beauty of charts.
- Generalization across domains : The evaluation benchmark built based on the NeurIPS paper supports chart generation in a variety of AI subfields.
- Flexible output format : Can output PNG/SVG images or Python code for subsequent editing and modification.
Technical principles of PaperBanana
- multi-agent architecture : Five professional AI collaborative work systems that simulate the workflow of human designers.
- two-stage process : Linear planning first determines the content and style, and then iterative optimization generates the final chart.
- Retrieval enhancement generation : Retrieve similar cases from high-quality paper databases to provide reference guidance for generation.
- visual language model : Utilize VLM’s cross-modal capabilities to achieve accurate conversion of text into visual descriptions.
- hybrid generation strategy : Method diagrams are generated using AI painting models, and statistical diagrams are generated using codes to ensure data accuracy.
- self-criticism mechanism : AI reviewers provide multiple rounds of inspection feedback to gradually eliminate errors and improve chart quality.
- Aesthetic norm learning : Automatically extract the color font layout of high-quality papers to form a reusable style template.
- Structured messaging : Precisely transfer visual element parameters between agents using standard data formats.
PaperBanana project address
- GitHub repository :https://github.com/dwzhu-pku/PaperBanana
- arXiv technical papers :https://arxiv.org/pdf/2601.23265
Application scenarios of PaperBanana
- Academic paper illustrations : Automatically generate publication-level method flow charts and model architecture diagrams to solve the problem of time-consuming and insufficient aesthetics for scientific researchers to draw pictures.
- Graduation thesis writing : Help graduate students to quickly generate charts that comply with format specifications and unify the visual style to enhance professionalism.
- Conference poster production : Transform research results into clear and intuitive poster content, optimize color layout and enhance information transmission efficiency.
- Scientific research project application : Generate a professional technical roadmap for fund applications, improve the visual quality of application materials and enhance the review impression.
- Academic speech report : Automatically generate PPT key diagrams, transforming complex algorithms into visual presentations that are easy for the audience to understand.
- Chart aesthetic upgrade : Modernize the early papers or hand-drawn sketches and unify multiple papers to form an academic brand. ©