rlhflow-llama-3-sft-segment Model Card

Method

The segment reward model assigns rewards to semantically meaningful text segments, segmented dynamically with an entropy-based threshold. It is trained on binary preference labels from human feedback, optimizing a Bradley-Terry loss function that aggregates segment rewards using the average function.

Architecture

image/png

Training

The phi-instruct-segment model is fine-tuned from meta-llama/Llama-3.1-8B-Instruct on the hendrydong/preference_700K dataset.

Citation

If you find this model or our research useful, please consider citing our paper:

@misc{yin2025segmentingtextlearningrewards,
      title={Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model}, 
      author={Yueqin Yin and Shentao Yang and Yujia Xie and Ziyi Yang and Yuting Sun and Hany Awadalla and Weizhu Chen and Mingyuan Zhou},
      year={2025},
      eprint={2501.02790},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.02790},
}
Downloads last month
20
Safetensors
Model size
7.5B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for yyqoni/meta-llama-3.1-instruct-8b-segment-rm-700k

Finetuned
(626)
this model

Dataset used to train yyqoni/meta-llama-3.1-instruct-8b-segment-rm-700k

Collection including yyqoni/meta-llama-3.1-instruct-8b-segment-rm-700k