GitHub - Y-Research-SBU/SlideGen: Official Repository for "SlideGen: Collaborative Multimodal Agents for Scientific Slide Generation"

SlideGen: Collaborative Multimodal Agents for Scientific Slide Generation

Xin Liang¹ Xiang Zhang² Yiwei Xu³ Siqi Sun⁴ Chenyu You¹

¹ Stony Brook University ² University of British Columbia
³ University of California, Los Angeles ⁴ Fudan University

Abstract

SlideGen is a collaborative, multimodal agent framework for automatically generating high-quality presentation slides from scientific papers.
Unlike prior approaches that reduce slide generation to text-only summarization, SlideGen treats slide generation as a design-aware multimodal reasoning problem, explicitly modeling structure planning, visual composition, and iterative refinement.

SlideGen orchestrates a set of specialized vision–language agents, each responsible for a distinct stage in a professional slide creation workflow:

Outliner Agent – analyzes the paper structure and constructs a coherent slide-level outline with ordered bullet points.

Mapper Agent – aligns figures and tables with their most relevant textual content.

Formulizer Agent – identifies and assigns equations to appropriate slides with contextual explanations.

Arranger Agent – selects layout templates and places multimodal elements to achieve balanced and diverse visual compositions.

Speaker Agent – generates concise presenter notes to support oral explanation.

Refiner Agent – performs slide merging, layout adjustment, and visual emphasis refinement for readability and consistency.

By integrating visual-in-the-loop reasoning with an extensible template library, SlideGen produces editable PPTX slides that exhibit strong logical flow, aesthetic balance, and faithful content coverage—without relying on reference decks.
Extensive evaluations across visual quality, content faithfulness, and communication effectiveness demonstrate that SlideGen consistently outperforms existing automated slide generation systems.

Quick start

1) Environment

Requirements:

Python 3.10+ (recommended)
An OpenAI API key

Create and activate an environment (example with conda), then install dependencies from requirements.txt:

conda create -n paper2pptx python=3.12 -y
conda activate paper2pptx

cd  SlideGen

python -m pip install --no-build-isolation \
  "python-pptx @ https://codeload.github.com/Force1ess/python-pptx/zip/dc356685d4d210a10abe1ffab3c21315cdfae63d"

pip install -r requirements.txt

Set your API key:

export OPENAI_API_KEY=your_key

Install LibreOffice

LibreOffice is useful if your pipeline converts slide formats or needs headless office rendering.

Windows

Download and install LibreOffice from the official website.
Add LibreOffice to your system PATH:
- Default install: add C:\Program Files\LibreOffice\program to PATH
- Custom install: add <your_install_path>\LibreOffice\program to PATH

macOS

brew install --cask libreoffice

Ubuntu/Linux

sudo apt install libreoffice
# Or using snap:
sudo snap install libreoffice

2) Run on one paper

This matches your usual command:

conda activate paper2pptx
cd SlideGen
export OPENAI_API_KEY=your_key

python -m SlidesAgent.new_pipeline_logtime   \
    --paper_path=your_path   \
    --model_name_t="4o"  \
    --model_name_v="4o"

Notes:

Change CUDA_VISIBLE_DEVICES to pick a different GPU. If you do not have CUDA, you can omit that prefix.
Replace --paper_path with your PDF path.

Output

By default, the pipeline writes a generated PPTX under contents/<paper_name>/ (the exact filename depends on your pipeline code and arguments).
The deck is a standard PPTX that you can open and edit in PowerPoint or Keynote.

What the system does

Conceptually, SlideGen runs a sequence of agents:

Outliner builds the slide structure and bullet plan
Mapper assigns figures and tables to the most relevant slides
Formulizer assigns equations to slides
Arranger selects a layout template and places assets
Refiner merges sparse slides and applies a consistent theme color for readability

WebUI (Slides Generator) — Quick Start

1) Terminal A — Start Backend

cd webui/backend
  
python -m uvicorn main:app --host 0.0.0.0 --port 8000 --reload

2) Terminal B — Start Frontend

cd webui/frontend
npm install
npm run dev

📊 Example Results

Our system generates professional academic decks with high visual quality. Here are some examples of generated decks:

Citation

@article{liang2025slidegen,
  title={SlideGen: Collaborative Multimodal Agents for Scientific Slide Generation},
  author={Liang, Xin and Zhang, Xiang and Xu, Yiwei and Sun, Siqi and You, Chenyu},
  journal={arXiv preprint arXiv:2512.04529},
  year={2025}
}

Acknowledgments

This codebase is built upon following open-source projects. We express our sincere gratitude to:

Docling: An open-source document processing framework that supports parsing and converting multiple document formats (e.g., PDF, DOCX, PPTX).
Marker: High-quality PDF parsing library that enables accurate content extraction from research papers.
python-pptx: Python library for creating PowerPoint (.PPTX) poster files.
Paper2poster: Multi-agent LLMs for creating PowerPoint (.PPTX) poster files.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
SlidesAgent		SlidesAgent
asset		asset
camel		camel
docling		docling
docs		docs
utils		utils
webui		webui
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SlideGen: Collaborative Multimodal Agents for Scientific Slide Generation

Abstract

Quick start

1) Environment

Install LibreOffice

2) Run on one paper

Output

What the system does

WebUI (Slides Generator) — Quick Start

1) Terminal A — Start Backend

2) Terminal B — Start Frontend

📊 Example Results

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Y-Research-SBU/SlideGen

Folders and files

Latest commit

History

Repository files navigation

SlideGen: Collaborative Multimodal Agents for Scientific Slide Generation

Abstract

Quick start

1) Environment

Install LibreOffice

2) Run on one paper

Output

What the system does

WebUI (Slides Generator) — Quick Start

1) Terminal A — Start Backend

2) Terminal B — Start Frontend

📊 Example Results

Citation

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages