Replicate GNMT architecture #26

dennybritz · 2017-03-12T20:30:58Z

To replicate the GNMT architecture, the following needs to happen. This list is not exhaustive and other things may be required:

Implement a decoder that applies attention to the first level of the cell. This should be straightforward and only require a few lines of code change.
Encoder with bidirectional encoder only in the first layer. This also should be straightforward and require < 100 lines of code to add a new encoder.
Add "Residual connections start from the layer third from the bottom in the encoder and decoder." This may require a new cell type.
Optimizer switching: Add parameters to switch optimizers during training, e.g. start with Adam and switch to SGD with learning rate decay later on. Probably ~50 lines of code to add new hyperparameters for optimizer switching.
Use google/sentencepiece to pre-process data
Add pruning heuristics to beam search. This may or may not need significant changes.
There are a few gradient clipping tricks that are not mentioned in the paper, but need to figure out the details for that.

This issue is only here to keep track of the high-level tasks. All of the points above should probably be done in separate issues.

ghost · 2017-06-07T14:46:59Z

I noticed that it's about three months ago. Is there anybody who actually implemented GNMT on seq2seq? It doesn't seem so but I thought it's worth asking.

dennybritz added the feature label Mar 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicate GNMT architecture #26

Replicate GNMT architecture #26

dennybritz commented Mar 12, 2017 •

edited

Loading

ghost commented Jun 7, 2017

Replicate GNMT architecture #26

Replicate GNMT architecture #26

Comments

dennybritz commented Mar 12, 2017 • edited Loading

ghost commented Jun 7, 2017

dennybritz commented Mar 12, 2017 •

edited

Loading