Assignment 3: Hidden Markov Models

In this assignment, we'll be revising word recognition, this time using hidden Markov models. As with assignment 1 (part 4), we'll be using the free spoken digits dataset. We will be using the hmmlearn library (which depends on numpy)

Please use hmms.py for the whole assignment. When submitting your work, please do not include the dataset.

Basic Setup

As you can learn from the tutorial, hmmlearn provides us with the base implementation of hidden Markov models; we'll be using the hmm.GaussianHMM, which implements HMMs with a single Gaussian emission probability per state. For a starter, build a basic isolated word recognizer that uses a separate model for each digit.

Compute the MFCC features for the complete data set (3000 recordings; use n_mfcc=13).
Implement a 6-fold cross-validation (x/v) loop to (later) figure out, which test speaker performs best/worst.
Inside the c/v loop, train an individual HMM with linear topology for each digit.
- The fit expects features to be sequential in a single array; see np.concatenate(..., axis=0)
- How many states (n_components) do you choose, and why?
- How can you enforce a linear topology?
- You might find that certain digits perform particularly bad; what could be a reason and how to mitigate it?
Compute a confusion matrix for each speaker and for the overall dataset.

Decoding

The example above is can't handle sequences of spoken digits. In this part of the assignment, you'll build a basic decoder that is able to decode arbitrary sequences of digits (without a prior, though). The decode method in hmmlearn only works for a single HMM. There are two ways how to solve this assignment:

Construct a "meta" HMM from the previously trained digit HMMs, by allowing state transitions from one digit to another; the resulting HMM can be decoded using the existing decode method (don't forget to re-map the state ids to the originating digit).
(Optional) Implement a real (time-synchronous) decoder using beam search. The straight-forward way is to maintain a (sorted) list of active hypotheses (ie. state history and current log-likelihood) that is first expanded and then pruned in each time step. The tricky part is at the "end" of a model: do you loop or expand new words?

Now for this assignment:

Generate a few test sequences of random length in in between 3 and 6 digits; use numpy.random.randint and be sure to also retain the digits sequence since we need to compute edit distance between reference and hypotheses later.
Combine th epreviously trained HMMs to a single "meta" HMM, altering the transition probabilities to make a circular graph that allows each word to follow another.
Implement a method that converts a state sequence relating to the meta HMM into a sequence of actual digits.
Decode your test sequences and compute the word error rate (WER)
Compute an overall (cross-validated) WER.
(Optional) Implement a basic time-synchronous beam search; how do the results compare to the above viterbi decoding in terms of accuracy and time?

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
res		res
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assignment 3: Hidden Markov Models

Basic Setup

Decoding

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Assignment 3: Hidden Markov Models

Basic Setup

Decoding

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages