Skip to content

willywsm1013/transformers-for-question-generation

Repository files navigation

Question Generation

This is code for my thesis, which is modified from https://github.com/huggingface/transformers

My code is put in ./question_generation .

requirements

  • python 3.7
  • pytorch 1.3.1
  • apex==0.1 (optional)
see https://github.com/NVIDIA/apex
  • spacy
[conda]
conda install -c conda-forge spacy
python -m spacy download en_core_web_sm
  • pip install requirements

Main files and Usage

run_answer_generation.py

[training]
--balance, balance number of start positions and end positions

[inference]
--sample_num, number of output answers [default:10]
--start_sample_num, number of start position [default:10]
--end_sample_num, number of end position [default:2]

run_question_generation.py

[training ]
--data_mode, features to use [default : acp]
--answer_position_encoding, which answer encoding to use [defalut:zero_one]

[inference]
--is_qg, input file is generated by model or not
--inference, choose search algorithm [default:greedy]
--beam_width, beam width in beam search and diverse beam search [default:10]
--group_beam_width, group beam width in diverse beam search [default:1]
--ag, input file contains only context and answers or not.

run_squad.py

[training]
--train_is_qg, training data is generated data or not
--topk, if training data is generated data, number of questions per answer [default:1]
--a_prob_threshold, keep questions whose answer probability higher than threshold
--sort_mode, sorting question using difference scoring method
--cluster, which cluster is used to cluster questions.

Experiments

Chapter 3 --- Question generation

Prepare data and evaluation model

bash chapter3_prepare.sh

experiment

  • step 1 : train question generation model
  • step 2 : infernce by beam search
  • step 3 : evaluation

bash chapter3_exp.sh

See chapter3_exp.sh for more details.

Chapter 4 --- Training QA model with generated data

Preprocess Data

  • split squad data
dataset
train.json QG_train.json QG_gen.json
dev.json QG_dev.json RC_dev.json
  • crawl wikipedia

bash chapter4_prepare.sh

Chapter 4-1 : generate qa pairs on golden answers

  • step 1 : prepare scoring model
  • step 2 : train QG models
  • step 3 : generate qa pairs using golden answers
  • step 4 : use generated qa pairs to train qa model.

bash chapter4-1_exp.sh

See chapter4-1_exp.sh for more details.

Chapter 4-2 : generate qa pairs on generated answers

  • step 1 : prepare scoring model
  • step 2 : train AG and QG models
  • step 3 : generate qa pairs using NER extractor or AG model
  • step 4 : use generated qa pairs to train qa model.

bash chapter4-2_exp.sh

See chapter4-2_exp.sh for more details.

Chapter 5 --- Transfer from SQuAD to NewsQA

Prepare data

bash chapter5_prepare.sh

experiment

  • step 1 : use models trained in Chapter 4 to generate qa pairs
  • step 2 : use generated qa pairs to train qa model.

bash chapter5_exp.sh

See chapter5_exp.sh for more details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published