AtomThink: A Slow Thinking Framework for Multimodal Mathematical Task

🎉Thank you for exploring AtomThink! We warmly invite you to ⭐️ star this repository, share your feedback via issues, and contribute to the project.

📝 Contents

News
Features
Case Study
Dataset
Usage
Citation
License
Acknowledgement

📣 News

[2024-12-13] We’ve released the AMATH-PRM dataset: AMATH-PRM!
[2024-12-13] We’ve released the AMATH-SFT dataset: AMATH-SFT!
[2024-12-13] We’ve released the pretrained weight for AtomThink-LLaVA: AtomThink-LLaVA!
[2024-12-13] We’ve released the pretrained weight for AtomThink-Emova: AtomThink-Emova!
[2024-12-13] We’ve released the pretrained weight for AtomThink-PRM: AtomThinkPRM!
[2024-12-13] We’ve released the code for inferencing!
[2024-11-20] The paper AtomThink: A Slow Thinking Framework for Multimodal Mathematical Task is now available on arXiv!
[2024-11-20] Thank you for visiting this repository!

💡Features

Key Features

🧠 Introduces GPT-o1 style reasoning via long CoT for complex multimodal mathematical tasks.
🛠️ Combines a CoT annotation engine, atomic step fine-tuning, and policy search strategies to enhance reasoning.
📊 A capability evaluation metric to perform a quality assessment of each reasoning steps.
⚡ Test-time scaling law in MLLM.
📈 State-of-the-art performance in multimodal mathematical reasoning tasks.

Abstract

In this paper, we address the challenging task of multimodal mathematical reasoning by incorporating the ability of “slow thinking” into multimodal large language models (MLLMs). Contrary to existing methods that rely on direct or fast thinking, our key idea is to construct long chains of thought (CoT) consisting of atomic actions in a step-by-step manner, guiding MLLMs to perform complex reasoning. To this end, we design a novel AtomThink framework composed of three key modules: (i) a CoT annotation engine that automatically generates high-quality CoT annotations to address the lack of high-quality visual mathematical data; (ii) an atomic step fine-tuning strategy that jointly optimizes an MLLM and a policy reward model (PRM) for step-wise reasoning; and (iii) four different search strategies that can be applied with the PRM to complete reasoning. Additionally, we propose AtomMATH, a large-scale multimodal dataset of long CoTs, and an atomic capability evaluation metric for mathematical tasks. Extensive experimental results show that the proposed AtomThink significantly improves the performance of baseline MLLMs, achieving approximately 50% relative accuracy gains on MathVista and 120% on MathVerse.

Read the full paper

🚀 Case Study

We present the atomic CoT outputs generated by LLaVA-Llama3-8B and EMOVA-8B models trained with AtomThink. Compared to the original models, we are able to produce a structured thinking process similar to OpenAI-o1.

Example1:

Example2:

Example3:

Example4:

🖼️ Dataset

Details

The details of our AtomMATH dataset are shown in the table below. AMATH-SFT is used for instruction fine-tuning, while AMATH-PRM is used to train the policy reward model for language process supervision.

Source	Meta Samples	AMATH-SFT	AMATH-PRM
CLEVR	1929	11.2k	34.4k
Geometry3K	1201	11.1k	15.6k
MAVIS	3654	17.7k	31.4k
TabMWP	2463	15.7k	25.7k
GeomVerse	1347	9.9k	17.0k
MathV360K	10157	53.5k	33.6k
MMMU	76	0.6k	1.3k
GeoQA+	2082	19.5k	0
IconQA	3199	18.1k	0
Total	26108	157k	159k

Examples

Example1 of AMATH-SFT dataset

Example2 of AMATH-SFT dataset

Example1 of AMATH-PRM dataset

Example2 of AMATH-PRM dataset

⚙️ Usage

Model

You can download our model from AtomThink-LLaVA, AtomThink-Emova and AtomThinkPRM.

Dataset

You can download our training data from AMATH-SFT and AMATH-PRM.

Quick Start

Install the environment as follows:

pip install -r requirements.txt

Set up your OpenAI API key:

os.environ['OPENAI_API_KEY'] = 'YOUR KEY'

Modify the eval_configs/MathVerse.py file to set the config:

"mmcv_config": "./model_configs/atomthink-llava-llama3-8b.py",
"prm_model": "YOUR_PATH/AtomThinkPRM",
"image_dir": ".MathVerse/images",
"question_file": ".MathVerse/testmini.json",
"answers_file": ".results/",
"extraction_file": ".results/",
"score_file": ".results/",
"log": ".results/logs"

Modify the model_configs/atomthink-llava-llama3-8b.py file to set the model path:

training_args = dict(
    output_dir="YOUR_PRETRAINED_MODEL_DIR/atomthink-llava-llama3-8b",)

Perform inference:

CUDA_VISIBLE_DEVICE=0,1 python evaluation.py --config eval_configs/MathVerse.py

📖 Citation

If you find this project useful, please cite our paper:

@article{xiang2024atomthink,
  title={AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
},
  author={Kun Xiang},
  journal={arXiv preprint arXiv:2411.11930},
  year={2024},
  doi={https://doi.org/10.48550}
}

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgement

We would like to thank the following repositories for their contributions:

bklieger-groq/g1: This library was used for data processing.
openreasoner/openr: This tool was helpful for deploying the process supervision model.

✨ Thank you for your interest in our work! ✨

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data_engine		data_engine
eval_configs		eval_configs
figures		figures
model_configs		model_configs
utils		utils
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Task

📝 Contents

📣 News

💡Features

🚀 Case Study

🖼️ Dataset

Details

Examples

⚙️ Usage

Model

Dataset

Quick Start

📖 Citation

📄 License

🙏 Acknowledgement

About

Releases

Packages

Languages

License

Quinn777/AtomThink

Folders and files

Latest commit

History

Repository files navigation

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Task

📝 Contents

📣 News

💡Features

🚀 Case Study

🖼️ Dataset

Details

Examples

⚙️ Usage

Model

Dataset

Quick Start

📖 Citation

📄 License

🙏 Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages