|
| 1 | +# NQG |
| 2 | +This repository contains code for the paper "[Neural Question Generation from Text: A Preliminary Study](https://arxiv.org/abs/1704.01792)" |
| 3 | + |
| 4 | +## About this code |
| 5 | + |
| 6 | +The experiments in the paper were done with an in-house deep learning tool. Therefore, we re-implement this with PyTorch as a reference. |
| 7 | + |
| 8 | +This code only implements the setting `NQG+` in the paper. |
| 9 | +Within 1 hour's training on Tesla P100, the `NQG+` model achieves 12.35 BLEU-4 score on the dev set as reported in our paper. |
| 10 | + |
| 11 | +If you find this code useful in your research, please consider citing: |
| 12 | + |
| 13 | + @article{zhou2017neural, |
| 14 | + title={Neural Question Generation from Text: A Preliminary Study}, |
| 15 | + author={Zhou, Qingyu and Yang, Nan and Wei, Furu and Tan, Chuanqi and Bao, Hangbo and Zhou, Ming}, |
| 16 | + journal={arXiv preprint arXiv:1704.01792}, |
| 17 | + year={2017} |
| 18 | + } |
| 19 | + |
| 20 | + |
| 21 | + |
| 22 | +## How to run |
| 23 | + |
| 24 | +### Prepare the dataset and code |
| 25 | + |
| 26 | +Make a experiment home folder for NQG data and code: |
| 27 | +```bash |
| 28 | +NQG_HOME=~/workspace/nqg |
| 29 | +mkdir -p $NQG_HOME/code |
| 30 | +mkdir -p $NQG_HOME/data |
| 31 | +cd $NQG_HOME/code |
| 32 | +git clone https://github.com/magic282/NQG.git |
| 33 | +cd $NQG_HOME/data |
| 34 | +wget https://res.qyzhou.me/redistribute.zip |
| 35 | +unzip redistribute.zip |
| 36 | +``` |
| 37 | +Put the data in the folder `$NQG_HOME/code/data/giga` and organize them as: |
| 38 | +``` |
| 39 | +nqg |
| 40 | +├── code |
| 41 | +│ └── NQG |
| 42 | +│ └── seq2seq_pt |
| 43 | +└── data |
| 44 | + └── redistribute |
| 45 | + ├── QG |
| 46 | + │ ├── dev |
| 47 | + │ ├── test |
| 48 | + │ ├── test_sample |
| 49 | + │ └── train |
| 50 | + └── raw |
| 51 | +``` |
| 52 | +Then collect vocabularies: |
| 53 | +```bash |
| 54 | +python $NQG_HOME/code/NQG/seq2seq_pt/CollectVocab.py \ |
| 55 | + $NQG_HOME/data/redistribute/QG/train/train.txt.source.txt \ |
| 56 | + $NQG_HOME/data/redistribute/QG/train/train.txt.target.txt \ |
| 57 | + $NQG_HOME/data/redistribute/QG/train/vocab.txt |
| 58 | +python $NQG_HOME/code/NQG/seq2seq_pt/CollectVocab.py \ |
| 59 | + $NQG_HOME/data/redistribute/QG/train/train.txt.bio \ |
| 60 | + $NQG_HOME/data/redistribute/QG/train/bio.vocab.txt |
| 61 | +python $NQG_HOME/code/NQG/seq2seq_pt/CollectVocab.py \ |
| 62 | + $NQG_HOME/data/redistribute/QG/train/train.txt.pos \ |
| 63 | + $NQG_HOME/data/redistribute/QG/train/train.txt.ner \ |
| 64 | + $NQG_HOME/data/redistribute/QG/train/train.txt.case \ |
| 65 | + $NQG_HOME/data/redistribute/QG/train/feat.vocab.txt |
| 66 | +head -n 20000 $NQG_HOME/data/redistribute/QG/train/vocab.txt > $NQG_HOME/data/redistribute/QG/train/vocab.txt.20k |
| 67 | +``` |
| 68 | + |
| 69 | +### Setup the environment |
| 70 | +#### Package Requirements: |
| 71 | +``` |
| 72 | +nltk scipy numpy pytorch |
| 73 | +``` |
| 74 | +**PyTorch version**: This code requires PyTorch v0.4.0. |
| 75 | + |
| 76 | +**Python version**: This code requires Python3. |
| 77 | + |
| 78 | +**Warning**: Older versions of NLTK have a bug in the PorterStemmer. Therefore, a fresh installation or update of NLTK is recommended. |
| 79 | + |
| 80 | +A Docker image is also provided. |
| 81 | +#### Docker image |
| 82 | +```bash |
| 83 | +docker pull magic282/pytorch:0.4.0 |
| 84 | +``` |
| 85 | +### Run training |
| 86 | +The file `run.sh` is an example. Modify it according to your configuration. |
| 87 | +#### Without Docker |
| 88 | +```bash |
| 89 | +bash $NQG_HOME/code/NQG/seq2seq_pt/run_squad_qg.sh $NQG_HOME/data/redistribute/QG $NQG_HOME/code/NQG/seq2seq_pt |
| 90 | +``` |
| 91 | +#### With Docker |
| 92 | +```bash |
| 93 | +nvidia-docker run --rm -ti -v $NQG_HOME:/workspace magic282/pytorch:0.4.0 |
| 94 | +``` |
| 95 | +Then inside the docker: |
| 96 | +```bash |
| 97 | +bash code/NQG/seq2seq_pt/run_squad_qg.sh /workspace/data/redistribute/QG /workspace/code/NQG/seq2seq_pt |
| 98 | +``` |
0 commit comments