Skip to content

How do I Read Things in BrainScript

Chris Basoglu edited this page Apr 12, 2017 · 5 revisions

Specify multiple label streams with the HTKMLFReader

The HTKMLFReader (the reader to read Master Label Files (MLF) of the Hidden Markov Toolkit (HTK)) can be configured to read multiple label streams. The example below is taken from TIMIT_TrainMultiTask_ndl_deprecated.cntk in the Examples directory:

reader = {
    readerType = "HTKMLFReader"
    ...
    labels = {
        mlfFile = "$MlfDir$/TIMIT.train.align_cistate.mlf.cntk"
        labelMappingFile = "$MlfDir$/TIMIT.statelist"
        labelDim = 183
        labelType = "category"
    }
    regions = {
        mlfFile = "$MlfDir$/TIMIT.train.align_dr.mlf.cntk"
        labelDim = 8
        labelType = "category"
    }
}

Use built in readers with multiple inputs

See the description at Understanding and Extending Readers and look for the section describing how to "compose several data deserializers"

Put labels and features in separate files with CNTKTextFormatReader

Use the composite reader to specifiy the two files, one for lables, and one for features. And make sure to match sequence id's in labels file and the features file.

reader = [
  …
  deserializers = (
  [
      type = "CNTKTextFormatDeserializer" ; module = "CNTKTextFormatReader"
      file = "$RootDir$/features.txt"
      input = [ features = [...]]
  ]:[
      type = "CNTKTextFormatDeserializer" ; module = "CNTKTextFormatReader"
      file = "$RootDir$/labels.txt"
      input = [ labels = [...]]
  ]
]
Clone this wiki locally