Skip to content

seqlrn/3-hmm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assignment 3: Hidden Markov Models

In this assignment, we'll be revising word recognition, this time using hidden Markov models. As with assignment 1 (part 4), we'll be using the free spoken digits dataset. We will be using the hmmlearn library (which depends on numpy)

Please use hmms.py for the whole assignment. When submitting your work, please do not include the dataset.

Basic Setup

As you can learn from the tutorial, hmmlearn provides us with the base implementation of hidden Markov models; we'll be using the hmm.GaussianHMM, which implements HMMs with a single Gaussian emission probability per state. For a starter, build a basic isolated word recognizer that uses a separate model for each digit.

  1. Compute the MFCC features for the complete data set (3000 recordings; use n_mfcc=13).
  2. Implement a 6-fold cross-validation (x/v) loop to (later) figure out, which test speaker performs best/worst.
  3. Inside the c/v loop, train an individual HMM with linear topology for each digit.
    • The fit expects features to be sequential in a single array; see np.concatenate(..., axis=0)
    • How many states (n_components) do you choose, and why?
    • How can you enforce a linear topology?
    • You might find that certain digits perform particularly bad; what could be a reason and how to mitigate it?
  4. Compute a confusion matrix for each speaker and for the overall dataset.

Decoding

The example above is can't handle sequences of spoken digits. In this part of the assignment, you'll build a basic decoder that is able to decode arbitrary sequences of digits (without a prior, though). The decode method in hmmlearn only works for a single HMM. There are two ways how to solve this assignment:

  1. Construct a "meta" HMM from the previously trained digit HMMs, by allowing state transitions from one digit to another; the resulting HMM can be decoded using the existing decode method (don't forget to re-map the state ids to the originating digit).
  2. (Optional) Implement a real (time-synchronous) decoder using beam search. The straight-forward way is to maintain a (sorted) list of active hypotheses (ie. state history and current log-likelihood) that is first expanded and then pruned in each time step. The tricky part is at the "end" of a model: do you loop or expand new words?

Now for this assignment:

  1. Generate a few test sequences of random length in in between 3 and 6 digits; use numpy.random.randint and be sure to also retain the digits sequence since we need to compute edit distance between reference and hypotheses later.
  2. Combine th epreviously trained HMMs to a single "meta" HMM, altering the transition probabilities to make a circular graph that allows each word to follow another.
  3. Implement a method that converts a state sequence relating to the meta HMM into a sequence of actual digits.
  4. Decode your test sequences and compute the word error rate (WER)
  5. Compute an overall (cross-validated) WER.
  6. (Optional) Implement a basic time-synchronous beam search; how do the results compare to the above viterbi decoding in terms of accuracy and time?

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages