Skip to content

Commit 7d2d64c

Browse files
Merge branch 'main' into main
2 parents 63899dd + 1635abe commit 7d2d64c

File tree

6 files changed

+280
-5
lines changed

6 files changed

+280
-5
lines changed

_blog.yml

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4906,13 +4906,28 @@
49064906
- community
49074907
- open-source
49084908

4909+
- local: open-asr-leaderboard
4910+
date: Nov 21, 2025
4911+
tags:
4912+
- audio
4913+
- speech
4914+
- leaderboard
4915+
4916+
- local: rapidfireai
4917+
date: Nov 21, 2025
4918+
tags:
4919+
- llm
4920+
- experimentation
4921+
- fine-tuning
4922+
- post-training
4923+
- trl
4924+
- rapidfireai
4925+
49094926
- local: intel-deepmath
4910-
date: Nov 20, 2025
4927+
date: Nov 24, 2025
49114928
tags:
49124929
- llm
49134930
- reasoning
49144931
- agents
49154932
- math
4916-
- grpo
4917-
4918-
4933+
- grpo

anylanguagemodel.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "Introducing AnyLanguageModel: One API for Local and Remote LLMs on Apple Platforms"
3-
thumbnail: /assets/anylanguagemodel/banner.png
3+
thumbnail: /blog/assets/anylanguagemodel/banner.png
44
authors:
55
- user: mattt
66
guest: true
112 KB
Loading

assets/rapidfireai/thumbnail.png

450 KB
Loading

open-asr-leaderboard.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
title: "Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks"
3+
thumbnail: /blog/assets/open-asr-leaderboard/thumbnail.png
4+
authors:
5+
- user: bezzam
6+
- user: Steveeeeeeen
7+
- user: eustlb
8+
- user: reach-vb
9+
---
10+
11+
12+
# Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks
13+
14+
While everyone (and their grandma 👵) is spinning up new ASR models, picking the right one for your use case can feel more overwhelming than choosing your next Netflix show. As of 21 Nov 2025, there are **150 [Audio-Text-to-Text](https://huggingface.co/models?pipeline_tag=audio-text-to-text&sort=trending)** and **27K [ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=trending)** on the Hub 🤯
15+
16+
Most benchmarks focus on **short-form English transcription (<30s),** and overlook other important tasks, such as (1) multilingual performance and (2) model throughput, which can a be deciding factor for long-form audio like meetings and podcasts.
17+
18+
Over the past two years, the [**Open ASR Leaderboard**](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard) has become a standard for comparing open and closed-source models on both **accuracy** and **efficiency**. Recently, **multilingual** and **long-form transcription** tracks have been added to the leaderboard 🎉
19+
20+
### TL;DR - [Open ASR Leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard)
21+
22+
- 📝 **New preprint** on ASR trends from the leaderboard: https://hf.co/papers/2510.06961
23+
- 🧠 **Best accuracy:** Conformer encoder + LLM decoders (open-source ftw 🥳)
24+
-**Fastest:** CTC / TDT decoders
25+
- 🌍 **Multilingual:** Comes at the cost of single-language performance
26+
-**Long-form:** Closed-source systems still lead (for now 😉)
27+
- 🧑‍💻 **Fine-tuning guides** ([Parakeet](https://github.com/Deep-unlearning/Finetune-Parakeet), [Voxtral](https://github.com/Deep-unlearning/Finetune-Voxtral-ASR), [Whisper](https://huggingface.co/learn/audio-course/chapter5/fine-tuning)): to continue pushing performance
28+
29+
30+
# Takeaways from 60+ models
31+
32+
As of 21 Nov 2025, the *Open ASR Leaderboard* compares **60+ open and closed-source models** from **18 organizations**, across **11 datasets**.
33+
34+
In a recent [preprint](https://hf.co/papers/2510.06961), we dive into the technical setup and highlight some key trends in modern ASR. Here are the big takeaways 👇
35+
36+
## 1. Conformer encoder 🤝 LLM decoder tops the charts 📈
37+
38+
<div align="center">
39+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/open_asr_leaderboard/leaderboard_WER.png" width="1024px" alt="thumbnail" />
40+
</div>
41+
42+
Models combining [**Conformer encoders**](https://huggingface.co/papers/2005.08100) with **large language model (LLM) decoders** currently lead in English transcription accuracy. For example, **NVIDIA’s [Canary-Qwen-2.5B](https://huggingface.co/nvidia/canary-qwen-2.5b)**, **IBM’s [Granite-Speech-3.3-8B](https://huggingface.co/ibm-granite/granite-speech-3.3-8b)**, and **Microsoft’s [Phi-4-Multimodal-Instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct)** achieve the lowest word error rates ([WER](https://huggingface.co/learn/audio-course/en/chapter5/evaluation#word-error-rate)), showing that integrating LLM reasoning can significantly boost ASR accuracy.
43+
44+
💡 *Pro-tip: NVIDIA introduced [Fast Conformer](https://huggingface.co/papers/2305.05084), a 2x faster variant of the Conformer, that is used in their Canary and Parakeet suite of models.*
45+
46+
## 2. Speed–accuracy tradeoffs ⚖️
47+
48+
<div align="center">
49+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/open_asr_leaderboard/leaderboard_RTX.png" width="1024px" alt="thumbnail" />
50+
</div>
51+
52+
While highly accurate, these LLM decoders tend to be **slower** than simpler approaches. On the *Open ASR Leaderboard*, efficiency is measured using *inverse real-time factor* (RTFx), where higher is better.
53+
54+
For even faster inference, [**CTC**](https://huggingface.co/learn/audio-course/en/chapter3/ctc#ctc-architectures) and [**TDT**](https://huggingface.co/papers/2304.06795) decoders deliver **10–100× faster throughput**, albeit with slightly higher error rates. This makes them ideal for **real-time**, **offline**, or **batch transcription** tasks (such as meetings, lectures, or podcasts).
55+
56+
## 3. Multilingual 🌍
57+
58+
<div align="center">
59+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/open_asr_leaderboard/multilingual.png" width="1024px" alt="thumbnail" />
60+
</div>
61+
62+
OpenAI’s [**Whisper Large v3**](https://huggingface.co/openai/whisper-large-v3) remains a strong multilingual baseline, supporting **99 languages**. However, **fine-tuned or distilled variants** like [**Distil-Whisper**](https://huggingface.co/distil-whisper/distil-large-v3.5) and [**CrisperWhisper**](https://huggingface.co/nyrahealth/CrisperWhisper) often outperform the original on **English-only** tasks, showing how targeted fine-tuning can improve specialization (*how to fine-tune? Check out guides for [Whisper](https://huggingface.co/learn/audio-course/chapter5/fine-tuning), [Parakeet](https://github.com/Deep-unlearning/Finetune-Parakeet), and [Voxtral](https://github.com/Deep-unlearning/Finetune-Voxtral-ASR)*).
63+
64+
That said, focusing on English tends to **reduce multilingual coverage** 👉 a classic case of the tradeoff between specialization and generalization. Similarly, while **self-supervised** systems like Meta’s [**Massively Multilingual Speech (MMS)**](https://huggingface.co/facebook/mms-1b-all) and [**Omnilingual ASR**](https://github.com/facebookresearch/omnilingual-asr) can support 1K+ languages, they trail behind language-specific encoders in accuracy.
65+
66+
⭐ *While just five languages are currently benchmarked, we’re planning to expand to more languages and are excited for new dataset and models contributions to multilingual ASR through GitHub [pull requests](https://github.com/huggingface/open_asr_leaderboard).*
67+
68+
🎯 Alongside multilingual benchmarks, several **community-driven leaderboards** focus on individual languages. For example, the [**Open Universal Arabic ASR Leaderboard**](https://huggingface.co/spaces/elmresearchcenter/open_universal_arabic_asr_leaderboard) compares models across **Modern Standard Arabic and regional dialects**, highlighting how speech variation and diglossia challenge current systems. Similarly. the [**Russian ASR Leaderboard**](https://huggingface.co/spaces/Vikhrmodels/Russian_ASR_Leaderboard) provides a growing hub for evaluating encoder-decoder and CTC models on **Russian-specific phonology and morphology**. These localized efforts mirror the broader multilingual leaderboard’s mission to encourage **dataset sharing, fine-tuned checkpoints, and transparent model comparisons**, especially in languages with fewer established ASR resources.
69+
70+
## 4. Long-form transcription is a different game ⏳
71+
72+
<div align="center">
73+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/open_asr_leaderboard/long_form.png" width="1024px" alt="thumbnail" />
74+
</div>
75+
76+
For **long-form audio** (e.g., podcasts, lectures, meetings), **closed-source systems** still edge out open ones. It could be due to domain tuning, custom chunking, or production-grade optimization.
77+
78+
Among open models, **OpenAI’s Whisper Large v3** performs the best. But for throughput, **CTC-based Conformers** shine 👉 for example, **NVIDIA’s [Parakeet CTC 1.1B](https://huggingface.co/nvidia/parakeet-ctc-1.1b)** achieves an **RTFx of 2793.75**, compared to **68.56** for Whisper Large v3, with only a moderate WER degradation (**6.68** and **6.43** respectively).
79+
80+
The tradeoff? Parakeet is **English-only,** again reminding us of that multilingual and specialization tradeoff 🫠.
81+
82+
*While closed systems still lead, there’s huge potential for open-source innovation here. Long-form ASR remains one of the most exciting frontiers for the community to tackle next!*
83+
84+
# 🎤 The Show Must Go On
85+
86+
Given how fast ASR is evolving, we’re excited to see what new architectures push performance and efficiency, and how the *Open ASR Leaderboard* continues to serve as a **transparent, community-driven benchmark** for the field, and as a reference for other leaderboards ([Russian](https://huggingface.co/spaces/Vikhrmodels/Russian_ASR_Leaderboard), [Arabic](https://huggingface.co/spaces/elmresearchcenter/open_universal_arabic_asr_leaderboard), and [Speech DeepFake Detection](https://huggingface.co/spaces/Speech-Arena-2025/Speech-DF-Arena)).
87+
88+
We’ll keep expanding the *Open ASR LeaderBoard* with **more models, more languages, and more datasets** so stay tuned 👀
89+
90+
👉 **Want to contribute?** Head on over to the [GitHub repo](https://github.com/huggingface/open_asr_leaderboard) to open a *pull request* 🚀

rapidfireai.md

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
---
2+
title: "20x Faster TRL Fine-tuning with RapidFire AI"
3+
thumbnail: /blog/assets/rapidfireai/thumbnail.png
4+
authors:
5+
- user: kbigdelysh
6+
guest: true
7+
org: rapidfire-ai-inc
8+
- user: arunkk09
9+
guest: true
10+
org: rapidfire-ai-inc
11+
- user: qgallouedec
12+
---
13+
14+
# 20x Faster TRL Fine-tuning with RapidFire AI
15+
16+
Hugging Face TRL now officially integrates with RapidFire AI to accelerate your fine-tuning and post-training experiments. TRL users can now discover, install, and run RapidFire AI as the fastest way to compare multiple fine-tuning/post-training configurations to customize LLMs without major code changes and without bloating GPU requirements.
17+
18+
## Why this matters
19+
20+
When fine-tuning or post-training LLMs, teams often do not have the time and/or budget to compare multiple configs even though that can significantly boost eval metrics. RapidFire AI lets you launch multiple TRL configs concurrently--even on a single GPU--and compare them in near real time via a new adaptive, chunk-based scheduling and execution scheme. In internal benchmarks referenced in the TRL page, this delivers ~16–24× higher experimentation throughput than sequentially comparing configs one after another, enabling you to reach much better metrics much faster.
21+
22+
![RapidFire AI Architecture](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/rapidfireai_intro/rf-usage.png)
23+
*RapidFire AI establishes live three-way communication between your IDE, a metrics dashboard, and a multi-GPU execution backend*
24+
25+
## What you get, out of the box
26+
27+
- **Drop-in TRL wrappers** — Use `RFSFTConfig`, `RFDPOConfig`, and `RFGRPOConfig` as near-zero-code replacements for TRL's SFT/DPO/GRPO configs.
28+
29+
- **Adaptive chunk-based concurrent training** — RapidFire AI shards the dataset into a given number of chunks and cycles configs at chunk boundaries to enable earlier apples-to-apples comparisons and also maximize GPU utilization.
30+
31+
- **Interactive Control Ops (IC Ops)** — From the dashboard itself, you can Stop, Resume, Delete, and Clone-Modify, possibly with Warm-Start, any runs in flight to avoid wasting resources on underperforming configs and double-down on better performing configs--no job restarts, no juggling separate GPUs or clusters, no resource bloat.
32+
33+
![Interactive Control Operations](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/rapidfireai_intro/icop-clone.png)
34+
*Clone promising configurations with modified hyperparameters, optionally warm-starting from the parent's weights, all from the live dashboard*
35+
36+
- **Multi-GPU orchestration** — The RapidFire AI scheduler automatically places and orchestrates configs across available GPUs on chunks of data via effcient shared-memory mechanisms. You focus on your models and eval metrics, not plumbing.
37+
38+
- **MLflow-based dashboard** — Real-time metrics, logs, and IC Ops in one place as soon as you start your experiment. Support for more dashboards such as Trackio, W&B, and TensorBoard coming soon.
39+
40+
## How it works
41+
42+
RapidFire AI splits your dataset randomly into "chunks" and cycles LLM configurations through the GPUs at chunk boundaries. You get incremental signal on eval metrics across all configs much more quickly. The automatic checkpointing via an efficient shared-memory-based adapter/model spilling/loading mechanism keeps training smooth, stable, and consistent. Use IC Ops to adapt mid-flight to stop low-performers earlier and clone promising ones with tweaked config knobs, optionally warm-starting from the parent's weights.
43+
44+
![GPU Scheduling Comparison](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/rapidfireai_intro/gantt-2gpu.png)
45+
*Sequential vs. Task Parallel vs. RapidFire AI: The adaptive scheduler maximizes GPU utilization across multiple configs and GPUs. The bottom row shows IC Ops in action—stopping, cloning, and modifying runs mid-flight.*
46+
47+
## Getting Started
48+
49+
Install RapidFire AI and get running in under a minute:
50+
51+
```bash
52+
pip install rapidfireai
53+
54+
# Authenticate with Hugging Face
55+
huggingface-cli login --token YOUR_TOKEN
56+
57+
# Workaround for current issue
58+
pip uninstall -y hf-xet
59+
60+
# Initialize and start RapidFire AI
61+
rapidfireai init
62+
rapidfireai start
63+
```
64+
65+
The dashboard launches at `http://localhost:3000` where you can monitor and control all your experiments.
66+
67+
## Supported TRL trainers
68+
69+
- SFT with `RFSFTConfig`
70+
- DPO with `RFDPOConfig`
71+
- GRPO with `RFGRPOConfig`
72+
73+
These are designed as drop-in replacements so that you can keep your TRL mental model while gaining far more concurrency and control for your fine-tuning/post-training applications.
74+
75+
## Minimal TRL SFT example
76+
77+
Here's what it looks like to train **multiple configurations concurrently** even on a single GPU:
78+
79+
```python
80+
from rapidfireai import Experiment
81+
from rapidfireai.automl import List, RFGridSearch, RFModelConfig, RFLoraConfig, RFSFTConfig
82+
from datasets import load_dataset
83+
from transformers import AutoModelForCausalLM, AutoTokenizer
84+
85+
# Setup: load your dataset and define formatting
86+
dataset = load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset")
87+
train_dataset = dataset["train"].select(range(128)).shuffle(seed=42)
88+
89+
def formatting_function(row):
90+
return {
91+
"prompt": [
92+
{"role": "system", "content": "You are a helpful customer support assistant."},
93+
{"role": "user", "content": row["instruction"]},
94+
],
95+
"completion": [{"role": "assistant", "content": row["response"]}]
96+
}
97+
98+
dataset = dataset.map(formatting_function)
99+
100+
# Define multiple configs to compare
101+
config_set = List([
102+
RFModelConfig(
103+
model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
104+
peft_config=RFLoraConfig(r=8, lora_alpha=16, target_modules=["q_proj", "v_proj"]),
105+
training_args=RFSFTConfig(learning_rate=1e-3, max_steps=128, fp16=True),
106+
),
107+
RFModelConfig(
108+
model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
109+
peft_config=RFLoraConfig(r=32, lora_alpha=64, target_modules=["q_proj", "v_proj"]),
110+
training_args=RFSFTConfig(learning_rate=1e-4, max_steps=128, fp16=True),
111+
formatting_func=formatting_function,
112+
)
113+
])
114+
115+
# Run all configs concurrently with chunk-based scheduling
116+
experiment = Experiment(experiment_name="sft-comparison")
117+
config_group = RFGridSearch(configs=config_set, trainer_type="SFT")
118+
119+
def create_model(model_config):
120+
model = AutoModelForCausalLM.from_pretrained(
121+
model_config["model_name"],
122+
device_map="auto", torch_dtype="auto"
123+
)
124+
tokenizer = AutoTokenizer.from_pretrained(model_config["model_name"])
125+
return (model, tokenizer)
126+
127+
experiment.run_fit(config_group, create_model, train_dataset, num_chunks=4, seed=42)
128+
experiment.end()
129+
```
130+
131+
**What happens when you run this?**
132+
133+
Suppose you run the above on a 2-GPU machine. Instead of training sequentially (Config 1 → wait → Config 2 → wait), both configs train concurrently:
134+
135+
| Approach | Time till Comparative Decision | GPU utilization |
136+
|----------|-----------------|-----------------|
137+
| Sequential (traditional) | ~15 minutes | 60% utilization |
138+
| RapidFire AI (concurrent) | ~5 minutes | 95%+ utilization |
139+
140+
You can get to a comparative decision **3× sooner** on the same resources after both configs finish processing the first data chunk instead of waiting for them to see the whole dataset one after another. Open `http://localhost:3000` to watch live metrics and use IC Ops to stop, clone, or tweak runs in real-time based on what you're seeing.
141+
142+
## Benchmarks: Real-World Speedups
143+
144+
Here is what teams see on time to reach a comparable overall best training loss (across all tried configs) when switching from sequential comparisons to RapidFire AI-enabled hyperparallel experimentation:
145+
146+
| Scenario | Sequential Time | RapidFire AI Time | Speedup |
147+
|----------|----------------|-------------------|---------|
148+
| 4 configs, 1 GPU | 120 min | 7.5 min | **16×** |
149+
| 8 configs, 1 GPU | 240 min | 12 min | **20×** |
150+
| 4 configs, 2 GPUs | 60 min | 4 min | **15×** |
151+
152+
*Benchmarks on NVIDIA A100 40GB with TinyLlama-1.1B and Llama-3.2-1B models*
153+
154+
## Get Started Today
155+
156+
**🚀 Try it hands-on**: [Interactive Colab Notebook](http://tinyurl.com/rapidfireai-colab) — Zero setup, runs in your browser
157+
158+
**📚 Full Documentation**: [oss-docs.rapidfire.ai](https://oss-docs.rapidfire.ai) — Complete guides, examples, and API reference
159+
160+
**💻 GitHub**: [RapidFireAI/rapidfireai](https://github.com/RapidFireAI/rapidfireai) — Open source, production-ready
161+
162+
**📦 Install via PyPI**: [pypi.org/project/rapidfireai](https://pypi.org/project/rapidfireai)`pip install rapidfireai`
163+
164+
**💬 Join the Community**: [Discord](https://discord.gg/6vSTtncKNN) — Get help, share results, request features
165+
166+
---
167+
168+
RapidFire AI was built because the common status quo of trying one config at a time wastes both time and GPU cycles. With this official integration, every TRL user can fine-tune/post-train smarter, iterate faster, and ship better models.
169+
170+
**Try the integration and let us know**: How much faster is your experimentation loop? What should we build next? We're just getting started, and your feedback shapes where we go from here.

0 commit comments

Comments
 (0)