Skip to content

Active reward modeling with Fisher Information

Notifications You must be signed in to change notification settings

YunyiShen/ARM-FI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Codebase for Preprint

"Reviving The Classics: Active Reward Modeling in Large Language Model Alignment"

Authors: Yunyi Shen*, Hao Sun*, Jean-Francois Ton. The first two authors contribute equally.

[ Preprint ] | [Embeddings (To be released here)]

We have a series of work focusing on reward models in RLHF:

  • Part I. Reward Model Foundation preprint, repo
  • Part II. Active Reward Modeling (This repo)
  • Part III. Accelerating Reward Model Research with our Infra. (SOON)

Structure of the repo

Algorithms we tested were implemented in model, there are two algorithms from other authors, namely coreset (Huggins et al. 2016) in lrcoresets and batchBALD (Kirsch et al 2019) in batchbald_redux, we did minimal modification to make sure then can be compitable with our computation environment.

Experiment code to be released soon after we remove unnecessary parts due to our specific computation environment.

About

Active reward modeling with Fisher Information

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published