Will Brooks's picture

Will Brooks

TornButter

·

AI & ML interests

None yet

Recent Activity

reacted to MoritzLaurer's post with 🔥 2 days ago

The TRL v0.13 release is 🔥! My highlight are the new process reward trainer to train models similar to o1 and tool call support: 🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning. 🔀 Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub. 🛠️ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts. ⚖️ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation. Read the release notes and other resources here 👇 Release: https://github.com/huggingface/trl/releases/tag/v0.13.0 Mergekit: https://github.com/arcee-ai/mergekit Mixture of judges paper: https://huggingface.co/papers/2409.20370

liked a model 3 days ago

hexgrad/Kokoro-82M

liked a model 4 days ago

kudzueye/boreal-flux-dev-v2

View all activity

Organizations

None yet

models

None public yet

datasets

None public yet