Benchmark for WhisperAX & CLI #28

ZachNagengast · 2024-02-16T22:32:28Z

It would be great to start collecting reproducible performance benchmarks for supported hardware (e.g. A14+ and M1+). This should be a self-contained function that uses openai/whisper-base by default and optionally other versions that the benchmark submitter selects. Benchmarks should run on a standard set of audio files and reports should be in a digestible and shareable format:

Psuedo-code may look like this:

Detect current hardware and load the models that the user has chosen to benchmark (single, multiple, or all available models)
Download standard audio files from Hugging (jfk.wav for short-form, ted_60.wav and a sample clip from earnings22 for long-form transcriptions)
Generate the transcriptions over several iterations and runtime tabulate statistics.
- Runs in streaming and file-based "offline" mode - this will require streaming emulation
- Completes short-form bench and presents results before moving to long-form bench which can potentially take several minutes to complete
- Will want to track: time to first token, RTF, inference timings (for encoder and decoder), total pipeline timings (model load -> transcription result)
Export these into a markdown table with relevant device info, and current commit hash, which can be posted to GitHub for public tracking

References

Open ASR leaderboard benchmarks: https://github.com/huggingface/open_asr_leaderboard
Nice script for collecting environment info: https://github.com/pytorch/pytorch/blob/main/torch/utils/collect_env.py

Related Issue

#5

The text was updated successfully, but these errors were encountered:

atiorh · 2024-11-02T14:38:26Z

https://huggingface.co/spaces/argmaxinc/whisperkit-benchmarks

https://x.com/argmaxinc/status/1851723587423756680

ZachNagengast added this to WhisperKit Feb 16, 2024

ZachNagengast converted this from a draft issue Feb 16, 2024

ZachNagengast added feature New feature or request triaged This issue has been looked at and prioritized by a maintainer labels Feb 16, 2024

ZachNagengast mentioned this issue Feb 16, 2024

show benchmarks #5

Closed

ZachNagengast moved this from TODO: Features to TODO in WhisperKit Mar 28, 2024

ZachNagengast linked a pull request Jun 25, 2024 that will close this issue

Regression Test Pipeline #120

Merged

4 tasks

ZachNagengast removed a link to a pull request Jun 25, 2024

Regression Test Pipeline #120

Merged

4 tasks

atiorh closed this as completed Nov 2, 2024

github-project-automation bot moved this from TODO to Done in WhisperKit Nov 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark for WhisperAX & CLI #28

Benchmark for WhisperAX & CLI #28

ZachNagengast commented Feb 16, 2024 •

edited

Loading

atiorh commented Nov 2, 2024

Benchmark for WhisperAX & CLI #28

Benchmark for WhisperAX & CLI #28

Comments

ZachNagengast commented Feb 16, 2024 • edited Loading

References

Related Issue

atiorh commented Nov 2, 2024

ZachNagengast commented Feb 16, 2024 •

edited

Loading