Skip to content

WayXG/RLHF4MATH_Dev

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Building Math Agents with Multi-Turn Iterative Preference Learning

TL;DL: this is the repo for "Building Math Agents with Multi-Turn Iterative Preference Learning"

We consider the math problem solving with python interpreter, which means that the model can write a python code and ask the external environmnet to execute and receive the excutaion result, before the LLM makes its next decision.


Figure 1: Main evaluation results on the MATH and GSK8K datasets.

Structure

The main pipeline is divided into three steps:

It is recommended to have three separate environments for sft, inference, and alignment_train. Please refer to the corresponding part of this project for the detailed installation instruction.

Collection

Citation

If you find the content of this repo useful, please consider cite it as follows:

@article{xiong2024building,
  title={Building Math Agents with Multi-Turn Iterative Preference Learning},
  author={Xiong, Wei and Shi, Chengshuai and Shen, Jiaming and Rosenberg, Aviv and Qin, Zhen and Calandriello, Daniele and Khalman, Misha and Joshi, Rishabh and Piot, Bilal and Saleh, Mohammad and others},
  journal={arXiv preprint arXiv:2409.02392},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published