Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow protocol defined types for model inputs and outputs #281

Merged
merged 3 commits into from
Dec 19, 2024

Conversation

ZachNagengast
Copy link
Contributor

This change will allow arbitrary input and output types as part of the model protocols, supporting full MLX or MLTensor pipelines without the need to convert between types during inference.

This PR also contains some general fixes and cleanup

  • Uses the MelSpectrogram model input shapes for audio input length
    • Breaking change: WhisperKit.windowSamples is now Constants.defaultWindowSamples
  • Fixed the timestamp token filter rules
    • Transcripts will now have more timestamp tokens (segments) within each 30s window
  • Uses MLTensor operations for sampling on > iOS 18 and macOS 15 for a 2x speedup vs BNNS
  • CI and QoL upgrades

a2they and others added 3 commits December 18, 2024 16:23
Add arbitrary length audio
* Support generic io for model inputs and outputs

* Add speed factor to timing report

* Use actor for early stop checks for better concurrency safety

* Add io type protocol handling and tests

* Formatting

* Fix timestamp token filter logic and tests

* Run unit tests on any branch in PR

* Upload test failure results
@ZachNagengast ZachNagengast merged commit d191654 into main Dec 19, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants