8 4 369

Will Brooks

TornButter

AI & ML interests

None yet

Recent Activity

reacted to MoritzLaurer's post with 🔥 2 days ago

The TRL v0.13 release is 🔥! My highlight are the new process reward trainer to train models similar to o1 and tool call support: 🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning. 🔀 Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub. 🛠️ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts. ⚖️ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation. Read the release notes and other resources here 👇 Release: https://github.com/huggingface/trl/releases/tag/v0.13.0 Mergekit: https://github.com/arcee-ai/mergekit Mixture of judges paper: https://huggingface.co/papers/2409.20370

liked a model 3 days ago

hexgrad/Kokoro-82M

liked a model 4 days ago

kudzueye/boreal-flux-dev-v2

View all activity

Organizations

None yet

TornButter's activity

reacted to MoritzLaurer's post with 🔥 2 days ago

Post

1472

The TRL v0.13 release is 🔥! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

🔀 Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

🛠️ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

⚖️ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here 👇
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)

liked a model 3 days ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 5 days ago • 8.1k • 673

liked a model 4 days ago

kudzueye/boreal-flux-dev-v2

Text-to-Image • Updated Sep 5, 2024 • 23.8k • • 103

liked a model 9 days ago

deepseek-ai/DeepSeek-V3

Updated 13 days ago • 110k • 1.66k

liked 3 models 16 days ago

reacted to singhsidhukuldeep's post with 🔥 18 days ago

Post

2182

Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

🚀 Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512×512 pixels with 14×14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

⚡️ Under The Hood:
- Multi-stage training process with progressive resolution scaling (224→384→512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

📊 Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

🎯 Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!