Xin Liang1 Xiang Zhang2 Yiwei Xu3 Siqi Sun4 Chenyu You1
1 Stony Brook University
2 University of British Columbia
3 University of California, Los Angeles
4 Fudan University
SlideGen is a collaborative, multimodal agent framework for automatically generating high-quality presentation slides from scientific papers.
Unlike prior approaches that reduce slide generation to text-only summarization, SlideGen treats slide generation as a design-aware multimodal reasoning problem, explicitly modeling structure planning, visual composition, and iterative refinement.SlideGen orchestrates a set of specialized vision–language agents, each responsible for a distinct stage in a professional slide creation workflow:
- Outliner Agent – analyzes the paper structure and constructs a coherent slide-level outline with ordered bullet points.
- Mapper Agent – aligns figures and tables with their most relevant textual content.
- Formulizer Agent – identifies and assigns equations to appropriate slides with contextual explanations.
- Arranger Agent – selects layout templates and places multimodal elements to achieve balanced and diverse visual compositions.
- Speaker Agent – generates concise presenter notes to support oral explanation.
- Refiner Agent – performs slide merging, layout adjustment, and visual emphasis refinement for readability and consistency.
By integrating visual-in-the-loop reasoning with an extensible template library, SlideGen produces editable PPTX slides that exhibit strong logical flow, aesthetic balance, and faithful content coverage—without relying on reference decks.
Extensive evaluations across visual quality, content faithfulness, and communication effectiveness demonstrate that SlideGen consistently outperforms existing automated slide generation systems.
Requirements:
- Python 3.10+ (recommended)
- An OpenAI API key
Create and activate an environment (example with conda), then install dependencies from requirements.txt:
conda create -n paper2pptx python=3.12 -y
conda activate paper2pptx
cd SlideGen
python -m pip install --no-build-isolation \
"python-pptx @ https://codeload.github.com/Force1ess/python-pptx/zip/dc356685d4d210a10abe1ffab3c21315cdfae63d"
pip install -r requirements.txt
Set your API key:
export OPENAI_API_KEY=your_keyLibreOffice is useful if your pipeline converts slide formats or needs headless office rendering.
Windows
- Download and install LibreOffice from the official website.
- Add LibreOffice to your system
PATH:- Default install: add
C:\Program Files\LibreOffice\programtoPATH - Custom install: add
<your_install_path>\LibreOffice\programtoPATH
- Default install: add
macOS
brew install --cask libreofficeUbuntu/Linux
sudo apt install libreoffice
# Or using snap:
sudo snap install libreofficeThis matches your usual command:
conda activate paper2pptx
cd SlideGen
export OPENAI_API_KEY=your_key
python -m SlidesAgent.new_pipeline_logtime \
--paper_path=your_path \
--model_name_t="4o" \
--model_name_v="4o"Notes:
- Change
CUDA_VISIBLE_DEVICESto pick a different GPU. If you do not have CUDA, you can omit that prefix. - Replace
--paper_pathwith your PDF path.
By default, the pipeline writes a generated PPTX under contents/<paper_name>/ (the exact filename depends on your pipeline code and arguments).
The deck is a standard PPTX that you can open and edit in PowerPoint or Keynote.
Conceptually, SlideGen runs a sequence of agents:
- Outliner builds the slide structure and bullet plan
- Mapper assigns figures and tables to the most relevant slides
- Formulizer assigns equations to slides
- Arranger selects a layout template and places assets
- Refiner merges sparse slides and applies a consistent theme color for readability
cd webui/backend
python -m uvicorn main:app --host 0.0.0.0 --port 8000 --reloadcd webui/frontend
npm install
npm run devOur system generates professional academic decks with high visual quality. Here are some examples of generated decks:
@article{liang2025slidegen,
title={SlideGen: Collaborative Multimodal Agents for Scientific Slide Generation},
author={Liang, Xin and Zhang, Xiang and Xu, Yiwei and Sun, Siqi and You, Chenyu},
journal={arXiv preprint arXiv:2512.04529},
year={2025}
}
This codebase is built upon following open-source projects. We express our sincere gratitude to:
- Docling: An open-source document processing framework that supports parsing and converting multiple document formats (e.g., PDF, DOCX, PPTX).
- Marker: High-quality PDF parsing library that enables accurate content extraction from research papers.
- python-pptx: Python library for creating PowerPoint (.PPTX) poster files.
- Paper2poster: Multi-agent LLMs for creating PowerPoint (.PPTX) poster files.



