Tim Bula's picture

3 6 29

Tim Bula

timrbula

·

AI & ML interests

LLMs for language and code

Recent Activity

updated a model 2 days ago

timrbula/SmolLM2-FT-SmolTalk

reacted to MoritzLaurer's post with 🔥 2 days ago

The TRL v0.13 release is 🔥! My highlight are the new process reward trainer to train models similar to o1 and tool call support: 🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning. 🔀 Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub. 🛠️ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts. ⚖️ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation. Read the release notes and other resources here 👇 Release: https://github.com/huggingface/trl/releases/tag/v0.13.0 Mergekit: https://github.com/arcee-ai/mergekit Mixture of judges paper: https://huggingface.co/papers/2409.20370

liked a model 2 days ago

ibm/materials.smi-ted

View all activity

Organizations

timrbula's activity

New activity in google/gemma-2b 3 months ago

Unable to reproduce the score of gemma_2b at pass@1 in humaneval.

#53 opened 9 months ago by

New activity in ibm-granite/granite-8b-code-instruct-128k 6 months ago

Fix: link to 128k paper

#1 opened 6 months ago by

New activity in Mozilla/granite-34b-code-instruct-llamafile 6 months ago

Update README.md

#1 opened 6 months ago by