Building Math Agents with Multi-Turn Iterative Preference Learning

TL;DL: this is the repo for "Building Math Agents with Multi-Turn Iterative Preference Learning"

We consider the math problem solving with python interpreter, which means that the model can write a python code and ask the external environmnet to execute and receive the excutaion result, before the LLM makes its next decision.

Figure 1: Main evaluation results on the MATH and GSK8K datasets.

Structure

The main pipeline is divided into three steps:

SFT to train the SFT model.
Inference to generate new data and evaluate the model.
Multi-turn Alignment Algorithms to conduct the multi-turn DPO/KTO training.

It is recommended to have three separate environments for sft, inference, and alignment_train. Please refer to the corresponding part of this project for the detailed installation instruction.

Collection

SFT Dataset: RLHF4MATH/SFT_510K, which is a subset of nvidia/OpenMathInstruct-1
Prompt: RLHF4MATH/prompt_iter1, RLHF4MATH/prompt_iter2, RLHF4MATH/prompt_iter3
SFT Model: RLHF4MATH/Gemma-7B-it-SFT3epoch
Aligned Model: RLHF4MATH/Gemma-7B-it-M-DPO

Citation

If you find the content of this repo useful, please consider cite it as follows:

@article{xiong2024building,
  title={Building Math Agents with Multi-Turn Iterative Preference Learning},
  author={Xiong, Wei and Shi, Chengshuai and Shen, Jiaming and Rosenberg, Aviv and Qin, Zhen and Calandriello, Daniele and Khalman, Misha and Joshi, Rishabh and Piot, Bilal and Saleh, Mohammad and others},
  journal={arXiv preprint arXiv:2409.02392},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
SFT		SFT
alignment_algorithms		alignment_algorithms
assets		assets
inference		inference
useful_codes		useful_codes
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building Math Agents with Multi-Turn Iterative Preference Learning

Structure

Collection

Citation

About

Releases

Packages

Contributors 2

Languages

WayXG/RLHF4MATH_Dev

Folders and files

Latest commit

History

Repository files navigation

Building Math Agents with Multi-Turn Iterative Preference Learning

Structure

Collection

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages