Skip to content

DSL-Lab/aops

Repository files navigation

Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation

Project Website | 📝Paper | 📐LiveAopsBench | 📊AoPS-Ins (Third party)

This repository contains the code and instructions for reproducing the data collection and processing pipeline described in our paper "Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation".

Environment Setup

To install the required dependencies:

pip install -r requirements.txt

By default, we use a setup of 4xA100 GPU with 80GB memory each.

Data Collection

To construct the AoPS-Instruct dataset:

  1. To crawl the data, run the following script:

    bash scripts/crawl_raw.sh

    This will produce a raw jsonlines file: out/items_raw.jl to be processed in the next steps.

    Note: To perform a test run to make sure the whole pipeline is working, in the crawling script, you can add the test_mode option to crawl only 1000 datapoints:

    bash scripts/crawl_raw.sh --test_mode True

    Then, you can run the rest of the pipeline to make sure everything is runnable before running a larger script file. Make sure to delete the entire test-run folder ./out before rerunning the pipeline since the pipeline is designed to resume unless the entire folder is removed.

  2. Modify and run the reproduction script:

    • WORKDIR: The working directory for the script.
    • NUM_GPUS: The number of GPUS to use for parsing models. This variable should be an even number. Ensure that the CUDA_VISIBLE_DEVICES is set correctly. We assume 80GB A100/H100 is used and a minimum of 2 GPUs are needed.
    • ITEMS_RAW_PATH: The path to raw crawled data (from step 2).
    • Optionally change the parsing and rewriting models (default is Qwen 2.5 models).

    Then run

    bash scripts/reproduce.sh

    This will process the raw crawled data and create the final training dataset in the specified format. The script supports resuming, so if interrupted, it will pick up where it left off.

Processed Data

We provide the full code for reproducing AoPS-Ins and LiveAoPSBench, making it easy for you to explore and experiment with these tools. Processed data is available through a community reproduction effort, accessible here: Hugging Face Dataset. While we encourage the use of this third-party dataset, please be aware that we disclaim any liability for its use and any associated issues that may arise.

Evaluation Code

To run evaluation on the LiveAoPS Bench, please refer to eval directory.

Questions/Issues

Please submit a GitHub issue if you have any questions or find any bugs.

Citation

If you use this code or dataset in your research, please cite our paper:

@misc{aopsdataset,
      title={Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation}, 
      author={Sadegh Mahdavi and Muchen Li and Kaiwen Liu and Christos Thrampoulidis and Leonid Sigal and Renjie Liao},
      year={2025},
      eprint={2501.14275},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.14275}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published