This repository contains the work I have done for Diphone alignment and acoustic scores project as a part of Google Summer of Code 2017 program. Also please see the pull requests to the upstream PocketSphinx and CMUSphinx Website repositories.
Initial target list of diphones provided
by the project mentor James Salsman. Used by words.py
and difficulties.py
scripts.
Text transcriptions imported from LibriSpeech ASR corpus and TED-LIUM corpus release 2 in the same format.
Dictionaries with phonetic transcriptions copied from the corpora without changes.
Merged dictionary of the two above.
Produced by merge_dictionaries.py
script, used by state_align.c
program.
Phonemes dictionary.
Copied from
Pocketsphinx for Pronunication Evaluation
page, used by align.sh
script.
Lists of unigrams and bigrams listed by frequency obtained from
the contents of corpora
directory. Produced by ngrams.py
script, used by diphones.py
script.
Same lists, but with diphone transcriptions
built using the dictionaries. Produced by diphones.py
script, used by words.py
and am.py
scripts.
List of sample words for the diphones. "Words" can be unigrams or bigrams
(in case of diphones containing salience phone). Produced by words.py
script,
used by utts.py
and diphone_audio.py
scripts.
List of utterances for which forced phonetic aligner (align.sh
) produced some results.
Produced by running grep
on the logs of forced phonetic aligner, used by utts.py
script.
List of utterances containing sample words. Produced by utts.py
script,
used by praat.py
script.
Audio recordings of sample words, extracted from audio prepared by audio.py
script.
Produced manually with help of praat.py
and samples.praat
scripts, used by diphone_audio.py
script.
Audio samples of diphones extracted from the audio recordings of sample words.
Produced by diphone_audio.py
script, used by spectrogram.py
and pitches.praat
scripts.
Spectrogram images of the diphone audio samples.
Produced by spectrogram.py
script.
Pitch contour images of the diphone audio samples.
Produced by pitches.praat
script.
SVG files representing different places and manners of articulation.
Actual SVG files are not included but are listed in svg.lst
file and
can be downloaded from Interactive Sagittal Section
site using download.sh
script that can be found in the same directory.
Used by difficulties.py
script.
List of phones and corresponding values of the articulation switches
on Interactive Sagittal Section site. Produced manually by self-observation,
used by difficulties.py
script.
List of diphones with values of difficulty for human pronunciation.
Produced by difficulties.py
script.
List of utterances which caused
problems in the diphones acoustic model training. Produced by running grep
on the logs
of training process, used by am.py
script.
Database structure for the diphones acoustic model training.
Produced by am.py
script, used by sphinxtrain
.
Diphones acoustic model. Produced by sphinxtrain
.
Runs pocketsphinx_continuous
program with certain parameters.
Coverts a dictionary from phone to diphone representation for usage with a diphone acoustic model.
Adds diphone-level phonetic transcriptions to a list of ngrams.
Merges two dictionaries into single one.
Prepares list of utterances and TextGrid label files for samples.praat
script.
Creates list of utterances which both contain sample words for diphones and were successfully processed by forced phonetic aligner.
Generates database structure for the diphones acoustic model training.
Calculates the degree of difficulty for human pronunciation for given diphones based on the distance between SVG paths representing articulation configuration of two phones composing each of diphones.
Imports text transcriptions from TED-LIUM corpus structure to this repository.
Extracts all ngrams from text transcriptions of the corpora and saves them sorted by frequency.
Iterates over sample words and utterances, asks user to select an audio interval with sample word and saves it to the directory with words audio samples.
Selects sample words for diphones using rules described here.
Selects utterances containing sample words and converts their audio to the format suitable for PocketSphinx processing.
Extracts diphone audio samples from word audio samples.
Converts timestamped labels from output of PocketSphinx to Praat TextGrid label file.
Produces pitch contour images for diphone audio samples.
Produces spectrogram images for diphone audio samples.
Produces a phonetic alignment of given words for given audio file using state search mode of PocketSphinx.
Used in diphone_audio.py
script for the extraction of diphone's timestamp and duration.