-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hybrid autoregressive transducer (HAT) #1244
Conversation
Great! |
@csukuangfj could you also check this when you have some time? Thanks! |
Thanks! Left a minor comment. Otherwise, it looks good to me. |
Sorry it took a while since I was on vacation last 2 weeks. I have made the change. |
Thanks! |
This is an implementation of the HAT loss proposed in https://arxiv.org/abs/2003.07705.
The test produces reasonable looking losses. I am working on a LibriSpeech zipformer recipe using this loss. In general, it is not expected to improve upon the RNNT loss by itself, but may be useful for things like using external LMs. I am planning to use it in multi-talker ASR for speaker attribution (e.g. https://arxiv.org/abs/2309.08489)