This repo is the official implementation of the paper "Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament".
data/
: contains the datasets used in the experiments.pairwise/
: contains the source code of PairwiseRM.pairwise/compare_resp.py
: contains the implementation of PairwiseRM.pairwise/knockout.py
: contains the implementation of Knockout Tournament.
The checkpoint of our PairwiseRM model is coming soon. Stay tuned!
Before that you can run the code will online llm api like gpt4o
,claude-3.5-sonnet
or gemini-1.5-flash
for example:
export PYTHONPATH=$PYTHONPATH:$(pwd)
# Define the input file
input_file=data/math-500/LLaMA-3.1-8B-Instruction_64.json
# Define the prompt template
prompt_template=prompts/compare_0_ex.md
# Define the base URL and API key
judge_model=gpt-4o
base_url="https://api.openai.com/v1"
api_key="YOUR_API_KEY"
# Run the Python script with the appropriate arguments
python pairwise/knockout.py \
--model $judge_model \
--input $input_file \
--prompt_template $prompt_template \
--base_url $base_url \
--api_key $api_key \
-n 64
If you find our work useful, please consider citing our paper:
@article{liu2025pairwise,
title={Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament},
author={Liu, Yantao and Yao, Zijun and Min, Rui and Cao, Yixin and Hou, Lei and Li, Juanzi},
journal={arXiv preprint arXiv:2501.13007},
year={2025},
note={in progress work},
url={https://doi.org/10.48550/arXiv.2501.13007}
}