First I'd like to say a big thanks for sharing this. I came to this repo from O'Reilly website and both the article and video are great. I've been reading the code and there're couple of things I hope to discuss:
- Oriole LSTM.ipynb, In [19]:
The code seems to be using a rather confusing and complicated method to get the last time step output from LSTM and generate the logits, while it may be simplified as:
last = value[:, -1]
prediction = tf.layers.dense(last, numClasses)
Also in the placeholder definitions the first dim can be simply written as None to take any batch sizes.
- In [19]
The definition of dropout layer seems strange. In this way, when doing testing/predictions, the inputs will also be thrown away, and by definition the dropouts should only happen in the training phase.
Normally the dropout prob is defined as another input tensor and can be modified through feeddict in testing/prediction.
Was this done on purpose?
First I'd like to say a big thanks for sharing this. I came to this repo from O'Reilly website and both the article and video are great. I've been reading the code and there're couple of things I hope to discuss:
The code seems to be using a rather confusing and complicated method to get the last time step output from LSTM and generate the logits, while it may be simplified as:
last = value[:, -1]
prediction = tf.layers.dense(last, numClasses)
Also in the placeholder definitions the first dim can be simply written as None to take any batch sizes.
The definition of dropout layer seems strange. In this way, when doing testing/predictions, the inputs will also be thrown away, and by definition the dropouts should only happen in the training phase.
Normally the dropout prob is defined as another input tensor and can be modified through feeddict in testing/prediction.
Was this done on purpose?