GitHub - RajGothi/Improving-Automatic-Speech-Recognition-with-Dialect-Specific-Language-Models: This repository contains the implementation of our published paper titled 'Improving Automatic Speech Recognition with Dialect-Specific Language Models,' presented at SPECOM'23.

Abstract

We present an end-to-end Automatic Speech Recognition (ASR) system in the context of the recent challenge tasks for Bhojpuri and Bengali. Our implementation follows the currently popular wav2vec2 models while we investigate ways to leverage the dialect-categorised data in order to improve ASR performance. We report overall improvements in word error rate with dialect-specific language models for each of the languages. We present an analysis that provides insights into some of the factors underlying the success of dialect-specific language models. Paper , Oral Presentation

Method

Requirements

pip install .
wget -O - https://kheafield.com/code/kenlm.tar.gz | tar xz
mkdir kenlm/build && cd kenlm/build && cmake .. && make -j2

Preprocessing

MADASR Dataset can be found here.

To get the speech and corresponding transcript from the dataset and convert into the Huggigface-Dataset format:

python pre_processing.py \
  --repo_name "Trained_Model/wav2vec2-bh" \
  --path "Dataset/bh" \
  --dev_path "Dataset/bh/bh_dev/dev" \
  --train_text_path "MADASR-Competition/RESPIN_ASRU_Challenge_2023/corpus/bh/train/text" \
  --dev_text_path "MADASR-Competition/RESPIN_ASRU_Challenge_2023/corpus/bh/dev/text" \
  --dataset_name "bh_processed" \
  --vocab_path "Vocab/vocab_bh.json"

To Create the Dialect ID based Dataset for Evaluation:

python eval_dialect_dataset.py \
  --dev_path "Lab/Dataset/bh/bh_dev/dev" \
  --utt2lang_path "RESPIN_ASRU_Challenge_2023/corpus/bh/dev/utt2lang" \
  --dev_text_path "RESPIN_ASRU_Challenge_2023/corpus/bh/dev/text" \
  --processor_name "Trained_Model/wav2vec2-bh" \
  --dataset_name "Dataset/bh_dev_dialect"

Acoustic Model (AM) Training

python train_AM.py --config_path="Config/train_AM.yaml"

Language Model (LM) Training

python train_LM-All.py --config_path="Config/train_LM-All.yaml"
python train_LM_Dialect.py --config_path='Config/train_LM-Dialect.yaml'

Evaluation

python test_AM.py --config_path="Config/test_AM.yaml"
python test_AM_LM-All.py --config_path="Config/test_AM_LM-All.yaml"
python test_AM_LM-Dialect.py --config_path="Config/test_AM_LM-Dialect.yaml"

Inference

To run the model for one speech audio, you can use the notebook below.

Notebook

Results

All the results below are in WER (Word Error Rate).

Overall Performance across Dialect in different systems:

Model	Bengali	Bhojpuri
AM	21.8	21.21
AM + 5 gram LM-All	16.04	16.76
AM + 5 gram LM-Dialect	15.62	16.48
AM + 6 gram LM-All	16.06	16.67
AM + 6 gram LM-Dialect	15.68	16.26

Every one of the dialects benefits from the corresponding dialect-specific LM over the overall LM.

Bhojpuri Language:

Dialect/Model	AM	AM + 6 gram LM-All	AM + 6 gram LM-Dialect
D1	20.56	14.90	14.70
D2	20.89	16.00	15.80
D3	21.97	18.59	17.88
All	21.21	16.67	16.26

Bengali Language:

Dialect/Model	AM	AM + 5 gram LM-All	AM + 5 gram LM-Dialect
D1	19.76	15.02	14.02
D2	21.44	16.08	15.60
D3	18.40	14.44	13.93
D4	20.71	16.21	15.73
D5	27.90	18.19	18.15
All	21.21	16.04	15.62

Contact

If you have any questions, please feel free to contact me at: LinkedIn,Email

Citation

If you are using this code, please cite:

@InProceedings{10.1007/978-3-031-48309-7_5,
author="Gothi, Raj
and Rao, Preeti",
title="Improving Automatic Speech Recognition with Dialect-Specific Language Models",
booktitle="Speech and Computer",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="57--67",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract

Method

Requirements

Preprocessing

Acoustic Model (AM) Training

Language Model (LM) Training

Evaluation

Inference

Results

Overall Performance across Dialect in different systems:

Bhojpuri Language:

Bengali Language:

Contact

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Config		Config
Figure		Figure
Notebook		Notebook
Vocab		Vocab
.gitignore		.gitignore
README.md		README.md
eval_dialect_dataset.py		eval_dialect_dataset.py
pre_processing.py		pre_processing.py
requirements.txt		requirements.txt
test_AM.py		test_AM.py
test_AM_LM-All.py		test_AM_LM-All.py
test_AM_LM-Dialect.py		test_AM_LM-Dialect.py
train_AM.py		train_AM.py
train_LM-All.py		train_LM-All.py
train_LM_Dialect.py		train_LM_Dialect.py

RajGothi/Improving-Automatic-Speech-Recognition-with-Dialect-Specific-Language-Models

Folders and files

Latest commit

History

Repository files navigation

Abstract

Method

Requirements

Preprocessing

Acoustic Model (AM) Training

Language Model (LM) Training

Evaluation

Inference

Results

Overall Performance across Dialect in different systems:

Bhojpuri Language:

Bengali Language:

Contact

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages