-
Notifications
You must be signed in to change notification settings - Fork 4.3k
How do I Read Things in BrainScript
Chris Basoglu edited this page Apr 12, 2017
·
5 revisions
- Specify multiple label streams with the HTKMLFReader?
- Use the built-in readers to train a network model using multiple input files?
- Put labels and features in separate files with CNTKTextFormatReader?
The HTKMLFReader (the reader to read Master Label Files (MLF) of the Hidden Markov Toolkit (HTK)) can be configured to read multiple label streams. The example below is taken from TIMIT_TrainMultiTask_ndl_deprecated.cntk in the Examples directory:
reader = {
readerType = "HTKMLFReader"
...
labels = {
mlfFile = "$MlfDir$/TIMIT.train.align_cistate.mlf.cntk"
labelMappingFile = "$MlfDir$/TIMIT.statelist"
labelDim = 183
labelType = "category"
}
regions = {
mlfFile = "$MlfDir$/TIMIT.train.align_dr.mlf.cntk"
labelDim = 8
labelType = "category"
}
}
See the description at Understanding and Extending Readers and look for the section describing how to "compose several data deserializers"
Use the composite reader to specifiy the two files, one for lables, and one for features. And make sure to match sequence id's in labels file and the features file.
reader = [
…
deserializers = (
[
type = "CNTKTextFormatDeserializer" ; module = "CNTKTextFormatReader"
file = "$RootDir$/features.txt"
input = [ features = [...]]
]:[
type = "CNTKTextFormatDeserializer" ; module = "CNTKTextFormatReader"
file = "$RootDir$/labels.txt"
input = [ labels = [...]]
]
]