An acoustic-based method that can be used to calculate the distance between pronunciations.
- Python 3.6
- PRAAT 6.1.08
- Hidden Markov Toolkit (HTK) 3.4
- FAVE
- R
git clone https://github.com/Bartelds/acoustic-distance-measure.git
Identifiers of the audio samples used can be found in Audio
Data source: http://accent.gmu.edu/browse_language.php
Before the distances can be computed, the input data must be preprocessed once (step 1:4). This can be done by adhering to the following procedure:
Input: audio files
Output: aligned .TextGrid files
Forced-alignment
Forced-alignment is introduced to capture the words present inside the audio files. The Penn Phonetics Lab Forced Aligner is used to accomplish the task of forced-alignment.
- Resample all audio files to 16 KHz mono PCM.
- Create a transcript file that contains all the words spoken in the audio samples.
- Run alignment:
fa.sh
- Extract start and end of words:
extract_fa.praat
- Segment paragraphs into words:
wavsplitter.py
Generate MFCCs.
MFCC
- Generate .scp listing that suits your data:
example_hcopy.scp
- Use
config.txt
with HTK parameters. - Generate MFCCs:
HCopy -T 1 -C config -S example_hcopy.scp
- HTK compressed format should be exported:
./exporthtk.sh
Distances are calculated using Dynamic Time Warping.
DTW
dtw.R
computes the distances (includes normalization).