Bahdanau-and-Transformer-on-NMT

Introdcution

This repo is for a project of DDA4220 taught by Prof. Zhang Ruimao during AY22-23 Spring @CUHKSZ.

Contribution

Code

We implemented Bahdanau attention machanism from scratch and a vanilla version of Transformer using nn.Transformer. Both models are implemented in PyTorch.

For Bahdanau attention,

mainly referring to https://github.com/mhauskn/pytorch_attention/tree/main
supports bidirection, multiple layers of encoders and decoders

For vanilla Transformer,

uses nn.Transformer as the code base of our model
supports fast inference by overriding forward methods of decoder layers. Although model.generate(..., use_cache=True) in HuggingFace's inference pipeline does the same thing, we found that they have not implemented the vanilla Transformer (Please raise an issue if I am wrong!). Most importantly, we implemented it for the purpose of practice.

For both models,

support training on multiple GPUs via pytorch_lightning
support fp16 training via pytorch_lightning

All codes are integrated/written by the owner of this repo. Reach me via my email should you have any suggestion or questions.

Please also check out LLMZoo and play around with Phoenix and Chimera!

Codes for evaluation are taken from here.

Experiments

We completed a report aiming to compare the ability of three (types of) attention-based models:

Bahdanau RNN
vanilla Transformer
LLMs (Phoenix and Chimera)

Here are our major findings:

Badanau RNN
- Bahdanau RNN tends to be lazy during training (sec 5.2)
- Under our settings, the best teacher forcing ratio is found to be 0.5 during training (sec 5.3)
- A network being wide yet too shallow results in a slight decrease in performance (sec 5.4)
Vanilla Transformer (sec 6.3)
- Deeper ≠ better (Table 3)
- The model benefits from a slightly unbalanced structure. Given a total number of layers $N = E + D = $(#encoder layers) + (#decoder layers), $D>E$ often results in a better performance, indicating that decoder is a more powerful component in the model. (Table 4)
LLMs (Phoenix and Chimera) (sec 7.2)
- Phoenix outperforms Chimera. No surprise since the target language is Chinese, which is a non-Latin language, but Chimera is trained solely on Latin language while Phoenix is trained on both Latin and non-Latin language.
- Instruction tuning is crucial. All instrution-tuned models outperform their counterparts.

Steps to run

Install packages
```
pip install -r requirements.txt
```

Prepare training and validation data

python utils/combine_parts.py   # combine training file parts
python utils/process.py		    # does nothing but rename the keys in the json files
python utils/build_vocab.py		# builds vocab files that can be reused
python utils/make_ans_txt.py	# extracts the target language of validation dataset and stores it in the format that is required by the evaluation part.

For Chinese (target), we split each character as a token. For English (source), we use bert-base-uncased tokenizer, but only words that appear at least twice will be added to the vocab file.

Experiment: Bahdanau RNN

bash train_bahdanau.sh      # train
bash generate_bahdanau.sh   # inference

Experiment: vanilla Transformer

bash train_transformer.sh       # train
bash generate_transformer.sh    # inference

Experiment: Chimera and Phoenix

python generate_llm.py --config_pth utils/llm_config.yaml --eval_models chimera-chat-7b chimera-inst-chat-7b chimera-chat-13b phoenix-chat-7b phoenix-inst-chat-7b chimera-inst-chat-13b

Evaluation
```
bash evaluate.sh
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bahdanau-and-Transformer-on-NMT

Introdcution

Contribution

Code

Experiments

Steps to run

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data		data
logs		logs
models		models
utils		utils
DDA4220_on_the_ability_of_attention-based_model_in_nmt.pdf		DDA4220_on_the_ability_of_attention-based_model_in_nmt.pdf
README.md		README.md
evaluate.sh		evaluate.sh
generate_bahdanau.py		generate_bahdanau.py
generate_bahdanau.sh		generate_bahdanau.sh
generate_llm.py		generate_llm.py
generate_transformer.py		generate_transformer.py
generate_transformer.sh		generate_transformer.sh
requirements.txt		requirements.txt
train_bahdanau.py		train_bahdanau.py
train_bahdanau.sh		train_bahdanau.sh
train_transformer.py		train_transformer.py
train_transformer.sh		train_transformer.sh

g-h-chen/Bahdanau-and-Transformer-on-NMT

Folders and files

Latest commit

History

Repository files navigation

Bahdanau-and-Transformer-on-NMT

Introdcution

Contribution

Code

Experiments

Steps to run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages