You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SpecAugment (https://arxiv.org/abs/1904.08779) has resulted in huge improvements in speech recognition performance over the last few years.
Pitch
Any serious audio augmentation toolkit should include SpecAugment as a type of audio augmentation. It has become extremely popular in speech recognition to the point where one wonders about the quality of a research paper that does not use this as standard processing. This, combined with speed and frequency perturbation, has become de rigeur in the speech recognition field. It should be an additional form of processing and accompanied by best practices in applying the technique as there are many variations.
Alternatives
People use time and frequency perturbations by themselves, but when you have a lot of training data, this methodology tends to wash out. SpecAugment improves results even with a lot of training data (at the expense of bigger models).
Additional context
You might also wish to include suggestions for how to integrate AugLy into popular speech recognition toolkits like Kaldi.
The text was updated successfully, but these errors were encountered:
For sure. But torchaudio also comes with other standard augmentation processes as well, in which case people may not wish to switch between torchaudio and AugLy........
🚀 Feature
Add SpecAugment as a form of audio augmentation.
Motivation
SpecAugment (https://arxiv.org/abs/1904.08779) has resulted in huge improvements in speech recognition performance over the last few years.
Pitch
Any serious audio augmentation toolkit should include SpecAugment as a type of audio augmentation. It has become extremely popular in speech recognition to the point where one wonders about the quality of a research paper that does not use this as standard processing. This, combined with speed and frequency perturbation, has become de rigeur in the speech recognition field. It should be an additional form of processing and accompanied by best practices in applying the technique as there are many variations.
Alternatives
People use time and frequency perturbations by themselves, but when you have a lot of training data, this methodology tends to wash out. SpecAugment improves results even with a lot of training data (at the expense of bigger models).
Additional context
You might also wish to include suggestions for how to integrate AugLy into popular speech recognition toolkits like Kaldi.
The text was updated successfully, but these errors were encountered: