Semi-Automatic NLI Data Collection

This is a repository for data and code accompanying paper "Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options".

Datasets

The five datasets described in the paper are available under data/ directory: base_news, base_wiki, sim_news, sim_wiki, and translate_wiki. Each of the dataset comes with a training set and a test set, both in .jsonl format. Please refer to the paper for the statistics for each of dataset.

License

We use premises taken from the English Gigaword Fifth Edition, English Wikipedia and Simple Wikipedia (downloaded May 2020), and WikiMatrix. The English Gigaword is distributed under the LDC User Agreement license. Wikipedia is licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA) and the GNU Free Documentation License (GFDL).

Experiments

Code used for the experiments for the paper can be found under scripts. Please follow README in each sub-directory for more details. For experiments using jiant (we use v1.2), please follow the documentation for installation and instructions.

Citation

@inproceedings{vania2020asking,
    title = "{Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options}",
    author = "Vania, Clara  and
      Chen, Ruijie  and
      Bowman, Samuel R.",
    booktitle = "Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing",
    month = dec,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics"
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semi-Automatic NLI Data Collection

Datasets

License

Experiments

Citation

About

Releases

Packages

Languages

nyu-mll/semi-automatic-nli

Folders and files

Latest commit

History

Repository files navigation

Semi-Automatic NLI Data Collection

Datasets

License

Experiments

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages