Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

Replicate GNMT architecture #26

Open
7 tasks
dennybritz opened this issue Mar 12, 2017 · 1 comment
Open
7 tasks

Replicate GNMT architecture #26

dennybritz opened this issue Mar 12, 2017 · 1 comment
Labels

Comments

@dennybritz
Copy link
Contributor

dennybritz commented Mar 12, 2017

To replicate the GNMT architecture, the following needs to happen. This list is not exhaustive and other things may be required:

  • Implement a decoder that applies attention to the first level of the cell. This should be straightforward and only require a few lines of code change.
  • Encoder with bidirectional encoder only in the first layer. This also should be straightforward and require < 100 lines of code to add a new encoder.
  • Add "Residual connections start from the layer third from the bottom in the encoder and decoder." This may require a new cell type.
  • Optimizer switching: Add parameters to switch optimizers during training, e.g. start with Adam and switch to SGD with learning rate decay later on. Probably ~50 lines of code to add new hyperparameters for optimizer switching.
  • Use google/sentencepiece to pre-process data
  • Add pruning heuristics to beam search. This may or may not need significant changes.
  • There are a few gradient clipping tricks that are not mentioned in the paper, but need to figure out the details for that.

This issue is only here to keep track of the high-level tasks. All of the points above should probably be done in separate issues.

@ghost
Copy link

ghost commented Jun 7, 2017

I noticed that it's about three months ago. Is there anybody who actually implemented GNMT on seq2seq? It doesn't seem so but I thought it's worth asking.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant