Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
RLHFlow
's Collections
RLHFlow MATH Process Reward Model
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
RLHFLow Reward Models
SFT Models
SFT Models
updated
Nov 3, 2024
We train a series of SFT models on the high-quality SFT dataset of RLHFlow for research purpose.
Upvote
1
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
Nov 3, 2024
•
6.46k
•
9
RLHFlow/RLHFlow-SFT-Dataset-ver2
Viewer
•
Updated
Nov 2, 2024
•
2.32M
•
66
•
4
RLHFlow/LLaMA3-SFT-v2
Text Generation
•
Updated
Nov 3, 2024
•
850
•
1
RLHFlow/Llama3-SFT-v2.0-epoch1
Text Generation
•
Updated
Nov 3, 2024
•
20
RLHFlow/Llama3-SFT-v2.0-epoch2
Text Generation
•
Updated
Nov 3, 2024
•
10
RLHFlow/Llama3-SFT-v2.0-epoch3
Text Generation
•
Updated
Nov 3, 2024
•
1.48k
Upvote
1
Share collection
View history
Collection guide
Browse collections