This repository contains the data and crowdsourcing instructions used in What Makes Reading Comprehension Questions Difficult? (Sugawara et al., ACL 2022).
datacontains all collected data in our study.- The questions written for passages taken from MCTest, RACE, and ReClor are missing their passages because of its license. Refer to the
collect missing passagesection to get the complete data.
- The questions written for passages taken from MCTest, RACE, and ReClor are missing their passages because of its license. Refer to the
crowdsourcing_templatescontains the html templates of task and instructions used in crowdsourcing.- We used this crowdsourcing tool that was developed in our previous study.
- To view the instructions etc on the web browser, put these files under
web/templatesand modify a config file.
First, download the raw datasets from the following links:
- MCTest: https://mattr1.github.io/mctest/data.html
- RACE: https://www.cs.cmu.edu/~glai1/data/race/
- ReClor: https://whyu.me/reclor/#download
Make sure to put data such that:
data/mctest/mc{160,500}.train.tsvdata/race/train/{middle,high}/*.txtdata/reclor/train.json
Then run:
python data/collect_missing_passages.py
You will get data/complete_data.json for the complete data.
passage,question,options: question dataquestion_id:{source}_{plain,adv}plainis standard data collection andadvis adversarial data collection
passage_id: unique id for identifying the source passagegold_label: zero-indexed answer index (0-3) among four optionsworker_id: anonymized worker idelapsed_time_second: writing timesource: passage source- MCTest
- RACE
- Project Gutenberg
- Open ANC (Slate section)
- ReClor
- Wikipedia arts articles
- Wikipedia science articles
validation_dataworker_answer_index: zero-indexed answer indexcorrect: validator's answer matches the gold label or notelapsed_time_second: answering time for five questions (in a single HIT; not averaged)unanswerable: if being unanswerable option is flagged or notworker_id: anonymized worker id
model_predictionsandmodel_predictions-partial- a list of pairs of [
model_name,if_model_gets_correct_or_not] A: only options are givenP+A: only passage and options are givenQ+A: only question and options are given
- a list of pairs of [
validation_index_for_filtering: which validation votes are used for validation- A question is validated if at least one of two labels is equal to the gold label
validation_index_for_performance: which validation votes are used for computing human accuracyvalid: True if a question is validatedunanimous: True if both two filtering labels are equal to the gold labelhuman_accuracy: human accuracymodel_accuracy: average model accuracy of eight models (excluding Unified QA)human_model_gap:human_accuracy-model_accuracyquestion_type: interrogative word-based question typedifficulty: {easy,mid,hard} (Refer to the paper for the definition)reasoning_types: Some questions have annotation results.readability: values of readability measures.
For the questions collected with model-in-the-loop, there are the following values:
adversarial_model_prediction_probabilityadversarial_model_prediction_labelnum_of_adv_submission: how many times a worker makes submission for fooling the adversarial model (UnifiedQA large)adversarial_success: True if a worker fools the model
The collected questions and options (excluding passages) are released under Creative Commons Attribution 4.0 International License.
@inproceedings{sugawara-etal-2022-makes,
title={What Makes Reading Comprehension Questions Difficult?},
author={Saku Sugawara, Nikita Nangia, Alex Warstadt, Samuel R. Bowman},
booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
month=may,
year={2022},
address = {Online and Dublin, Ireland},
publisher = {Association for Computational Linguistics},
}