Skip to content

ORTOptimizer for wav2vec2-bert #2232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

aconeil
Copy link

@aconeil aconeil commented Apr 16, 2025

What does this PR do?

Add wav2vec2-bert to the list of possible models for optimization

Fixes #2221

Who can review?

@IlyasMoutawwakil

aconeil added 3 commits April 14, 2025 14:40
Add wav2vec2-bert to the list of possible models for optimization
ORTOptimizer for wav2vec2-bert
@aconeil aconeil marked this pull request as draft April 16, 2025 18:26
@IlyasMoutawwakil
Copy link
Member

please add it to testing as well

@eingrid
Copy link

eingrid commented Apr 20, 2025

Should'nt we also add
"wav2vec2-bert":NormalizedTextConfig in here:

# Contribution note: Please add new models in alphabetical order
_conf = {
"albert": NormalizedTextConfig,
"bart": BartLikeNormalizedTextConfig,
"bert": NormalizedTextConfig,

@IlyasMoutawwakil
Copy link
Member

@eingrid yes that as well

@aconeil
Copy link
Author

aconeil commented Apr 21, 2025

Wav2Vec2-Bert actually uses mel-spectrograms as the input, which is closer to the SpeechT5 Input than the wav2vec inputs.
I tried the suggested changes, with both NormalizedTextConfig and T5LikeNormalizedTextConfig and wasn't successful. I then tried adapting optimum/exporters/onnx/model_configs.py to add a specific class for the model, but receive an error when using the DummyAudioInputGenerator (since the input is different) and the DummySpeechT5InputGenerator (since there aren't speaker embeddings).
Do you have any suggestions?

@aconeil
Copy link
Author

aconeil commented Apr 23, 2025

I tried with Speech2TextDummyAudioInputGenerator as well today without success. The input for Wav2Vec2Bert is input_features with a size of [1, x, 160].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ORTOptimizer for wav2vec2-bert
3 participants