Multimodal Fine-Tuning Data Generator

Upload images to generate JSONL training data for multimodal fine‑tuning

中文 | English

Overview

This project generates JSONL training data from images, with a simple frontend UI.

Demo

output.compress-video-online.com.2.mp4

Quick Start

One‑click scripts

Linux/macOS:

cd backend
bash start.sh

Windows:

cd backend
start.bat

FAQ

Q: Which image formats are supported?
- A: JPG, PNG, WebP. Recommended size < 10MB per image.
Q: How to process a large batch of images?
- A: You can upload multiple images at once; recommend ≤ 50 per batch.
Q: Where are the generated files stored?
- A: In backend/outputs/ on the server, and they are auto‑downloaded to your computer.
Q: Can I customize the output data format?
- A: Yes. Modify create_training_data in backend/app.py.

Contributing

Issues and PRs are welcome for feature improvements, bug fixes, and documentation.

Community

Explore our community (Chinese): 👉 Tech Community | Fufan Space

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
backend		backend
src		src
README.md		README.md
README_zh.md		README_zh.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Fine-Tuning Data Generator

Overview

Demo

Quick Start

One‑click scripts

FAQ

Contributing

Community

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal Fine-Tuning Data Generator

Overview

Demo

Quick Start

One‑click scripts

FAQ

Contributing

Community

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages