Skip to content

A comparative analysis between Conformer Transducer, Whisper and Wav2vec2 for improving the child speech recognition

Notifications You must be signed in to change notification settings

C3Imaging/child_asr_conformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

A comparative analysis between Conformer Transducer, Whisper and Wav2vec2 for improving the child speech recognition

Automatic Speech Recognition (ASR) systems have progressed significantly in their performance on adult speech data; however, transcribing child speech remains challenging due to the acoustic differences in the characteristics of child and adult voices. This work aims to explore the potential of adapting state-of-the-art Conformer-Transducer models to child speech to improve child speech recognition performance. Furthermore, the results are compared with those of self-supervised Wav2Vec2 models and semi-supervised multi-domain Whisper models that were previously finetuned on the same data. We demonstrate that finetuning Conformer-Transducer models on child speech yields significant improvements in ASR performance on child speech, compared to the non-finetuned models. We also show Whisper and Wav2Vec2 adaptation on different child speech datasets. Our detailed comparative analysis shows that Wav2Vec provides the most consistent performance improvements of the three methods studied image

This is the github repository for the paper. All the code and research related material will be uploaded over time. Feel free to drop an email for any query regarding this research.

[WORK IN PROGRESS]

About

A comparative analysis between Conformer Transducer, Whisper and Wav2vec2 for improving the child speech recognition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published