Online RLHF - a RLHFlow Collection

RLHFlow 's Collections

RLHFlow MATH Process Reward Model

Standard-format-preference-dataset

Mixture-of-preference-reward-modeling

RM-Bradley-Terry

PM-pair

RLHFLow Reward Models

Online RLHF

updated Jun 12, 2024

Datasets, code, and models for online RLHF (i.e., iterative DPO)

RLHFlow/prompt-collection-v0.1

Viewer • Updated May 8, 2024 • 179k • 38 • 9
RLHFlow/pair-preference-model-LLaMA3-8B

Text Generation • Updated Oct 14, 2024 • 2.75k • 38
sfairXC/FsfairX-LLaMA3-RM-v0.1

Text Classification • Updated Oct 14, 2024 • 5.99k • 52
RLHFlow/SFT-OpenHermes-2.5-Standard

Viewer • Updated Apr 24, 2024 • 1M • 45 • 2
RLHFlow/iterative-prompt-v1-iter2-20K

Viewer • Updated May 3, 2024 • 20k • 259 • 2
RLHFlow/iterative-prompt-v1-iter3-20K

Viewer • Updated May 3, 2024 • 20k • 246 • 3
RLHFlow/iterative-prompt-v1-iter1-20K

Viewer • Updated May 3, 2024 • 20k • 261 • 2
Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R

Text Generation • Updated Jun 12, 2024 • 195 • 77
RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 66
Salesforce/LLaMA-3-8B-SFR-SFT-R

Text Generation • Updated May 31, 2024 • 20 • 7
RLHFlow/LLaMA3-SFT

Text Generation • Updated Nov 3, 2024 • 6.46k • 9
RLHFlow/LLaMA3-iterative-DPO-final

Text Generation • Updated Oct 14, 2024 • 7.19k • 40
RLHFlow/iterative-prompt-v1-iter4-20K

Viewer • Updated Jun 12, 2024 • 20k • 253
RLHFlow/iterative-prompt-v1-iter5-20K

Viewer • Updated Jun 12, 2024 • 20k • 39
RLHFlow/iterative-prompt-v1-iter6-20K

Viewer • Updated Jun 12, 2024 • 20k • 106
RLHFlow/iterative-prompt-v1-iter7-20K

Viewer • Updated Jun 12, 2024 • 20k • 109
RLHFlow/iterative-prompt-v1-iter8-20K

Viewer • Updated Jun 12, 2024 • 20k • 110
RLHFlow/iterative-prompt-v1-iter9-20K

Viewer • Updated Jun 12, 2024 • 19.9k • 106 • 1