Skip to content

Latest commit

 

History

History
34 lines (22 loc) · 1.89 KB

README.md

File metadata and controls

34 lines (22 loc) · 1.89 KB

Semi-Automatic NLI Data Collection

This is a repository for data and code accompanying paper "Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options".

Datasets

The five datasets described in the paper are available under data/ directory: base_news, base_wiki, sim_news, sim_wiki, and translate_wiki. Each of the dataset comes with a training set and a test set, both in .jsonl format. Please refer to the paper for the statistics for each of dataset.

License

We use premises taken from the English Gigaword Fifth Edition, English Wikipedia and Simple Wikipedia (downloaded May 2020), and WikiMatrix. The English Gigaword is distributed under the LDC User Agreement license. Wikipedia is licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA) and the GNU Free Documentation License (GFDL).

Experiments

Code used for the experiments for the paper can be found under scripts. Please follow README in each sub-directory for more details. For experiments using jiant (we use v1.2), please follow the documentation for installation and instructions.

Citation

@inproceedings{vania2020asking,
    title = "{Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options}",
    author = "Vania, Clara  and
      Chen, Ruijie  and
      Bowman, Samuel R.",
    booktitle = "Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing",
    month = dec,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics"
}