This is a repository for data and code accompanying paper "Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options".
The five datasets described in the paper are available under data/
directory: base_news
, base_wiki
, sim_news
, sim_wiki
, and translate_wiki
. Each of the dataset comes with a training set and a test set, both in .jsonl
format. Please refer to the paper for the statistics for each of dataset.
We use premises taken from the English Gigaword Fifth Edition, English Wikipedia and Simple Wikipedia (downloaded May 2020), and WikiMatrix. The English Gigaword is distributed under the LDC User Agreement license. Wikipedia is licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA) and the GNU Free Documentation License (GFDL).
Code used for the experiments for the paper can be found under scripts
. Please follow README in each sub-directory for more details. For experiments using jiant
(we use v1.2), please follow the documentation for installation and instructions.
@inproceedings{vania2020asking,
title = "{Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options}",
author = "Vania, Clara and
Chen, Ruijie and
Bowman, Samuel R.",
booktitle = "Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing",
month = dec,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics"
}