This is code for my thesis, which is modified from https://github.com/huggingface/transformers
My code is put in ./question_generation .
- python 3.7
- pytorch 1.3.1
- apex==0.1 (optional)
see https://github.com/NVIDIA/apex
- spacy
[conda]
conda install -c conda-forge spacy
python -m spacy download en_core_web_sm
- pip install requirements
[training]
--balance, balance number of start positions and end positions
[inference]
--sample_num, number of output answers [default:10]
--start_sample_num, number of start position [default:10]
--end_sample_num, number of end position [default:2]
[training ]
--data_mode, features to use [default : acp]
--answer_position_encoding, which answer encoding to use [defalut:zero_one]
[inference]
--is_qg, input file is generated by model or not
--inference, choose search algorithm [default:greedy]
--beam_width, beam width in beam search and diverse beam search [default:10]
--group_beam_width, group beam width in diverse beam search [default:1]
--ag, input file contains only context and answers or not.
[training]
--train_is_qg, training data is generated data or not
--topk, if training data is generated data, number of questions per answer [default:1]
--a_prob_threshold, keep questions whose answer probability higher than threshold
--sort_mode, sorting question using difference scoring method
--cluster, which cluster is used to cluster questions.
bash chapter3_prepare.sh
- step 1 : train question generation model
- step 2 : infernce by beam search
- step 3 : evaluation
bash chapter3_exp.sh
See chapter3_exp.sh for more details.
- split squad data
dataset | ||
---|---|---|
train.json | QG_train.json | QG_gen.json |
dev.json | QG_dev.json | RC_dev.json |
- crawl wikipedia
bash chapter4_prepare.sh
- step 1 : prepare scoring model
- step 2 : train QG models
- step 3 : generate qa pairs using golden answers
- step 4 : use generated qa pairs to train qa model.
bash chapter4-1_exp.sh
See chapter4-1_exp.sh for more details.
- step 1 : prepare scoring model
- step 2 : train AG and QG models
- step 3 : generate qa pairs using NER extractor or AG model
- step 4 : use generated qa pairs to train qa model.
bash chapter4-2_exp.sh
See chapter4-2_exp.sh for more details.
bash chapter5_prepare.sh
- step 1 : use models trained in Chapter 4 to generate qa pairs
- step 2 : use generated qa pairs to train qa model.
bash chapter5_exp.sh
See chapter5_exp.sh for more details.