Skip to content

Relaxed-System-Lab/multi-actor-data-selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

This is the repo for the paper (ACL2025)Efficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration. Illustration of multi-actor collaborative framework

Updates

Release plan

TODOs:

  • Model Checkpoints
  • BERT Topic Model Checkpoint
  • Labeled Slimpajama-670B datasets
  • Code for methods ......

About

This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published