Skip to content

THU-KEG/PairwiseRM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PairwiseRM

This repo is the official implementation of the paper "Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament".

Repository Structure

  • data/: contains the datasets used in the experiments.
  • pairwise/: contains the source code of PairwiseRM.
  • pairwise/compare_resp.py: contains the implementation of PairwiseRM.
  • pairwise/knockout.py: contains the implementation of Knockout Tournament.

The checkpoint of our PairwiseRM model is coming soon. Stay tuned!

Before that you can run the code will online llm api like gpt4o,claude-3.5-sonnet or gemini-1.5-flash

for example:

export PYTHONPATH=$PYTHONPATH:$(pwd)

# Define the input file
input_file=data/math-500/LLaMA-3.1-8B-Instruction_64.json

# Define the prompt template
prompt_template=prompts/compare_0_ex.md

# Define the base URL and API key
judge_model=gpt-4o
base_url="https://api.openai.com/v1"
api_key="YOUR_API_KEY"

# Run the Python script with the appropriate arguments
python pairwise/knockout.py \
    --model $judge_model \
    --input $input_file \
    --prompt_template $prompt_template \
    --base_url $base_url \
    --api_key $api_key \
    -n 64

Citation

If you find our work useful, please consider citing our paper:

@article{liu2025pairwise,
  title={Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament},
  author={Liu, Yantao and Yao, Zijun and Min, Rui and Cao, Yixin and Hou, Lei and Li, Juanzi},
  journal={arXiv preprint arXiv:2501.13007},
  year={2025},
  note={in progress work},
  url={https://doi.org/10.48550/arXiv.2501.13007}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages