WayXG

Follow

WayXG

Follow

1 follower · 2 following

Achievements

Achievements

Popular repositories Loading

Online-DPO-R1 Online-DPO-R1 Public

Forked from RLHFlow/Online-DPO-R1

Codebase for Iterative DPO Using Rule-based Rewards

Python 1
Online-RLHF Online-RLHF Public

Forked from RLHFlow/Online-RLHF

A recipe for online RLHF.

Python
RLHF-Reward-Modeling RLHF-Reward-Modeling Public

Forked from RLHFlow/RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

Python
ToRA ToRA Public

Forked from WeiXiongUST/ToRA

ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].

Python
RLHF4MATH_Dev RLHF4MATH_Dev Public

Python
preference-construction preference-construction Public

Python