You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.
To replicate the GNMT architecture, the following needs to happen. This list is not exhaustive and other things may be required:
Implement a decoder that applies attention to the first level of the cell. This should be straightforward and only require a few lines of code change.
Encoder with bidirectional encoder only in the first layer. This also should be straightforward and require < 100 lines of code to add a new encoder.
Add "Residual connections start from the layer third from the bottom in the encoder and decoder." This may require a new cell type.
Optimizer switching: Add parameters to switch optimizers during training, e.g. start with Adam and switch to SGD with learning rate decay later on. Probably ~50 lines of code to add new hyperparameters for optimizer switching.
I noticed that it's about three months ago. Is there anybody who actually implemented GNMT on seq2seq? It doesn't seem so but I thought it's worth asking.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
To replicate the GNMT architecture, the following needs to happen. This list is not exhaustive and other things may be required:
This issue is only here to keep track of the high-level tasks. All of the points above should probably be done in separate issues.
The text was updated successfully, but these errors were encountered: