PAXQA Datasets and Code

Code and Data for the paper "PAXQA: Generating Cross-lingual Question Answering Examples at Training Scale" at EMNLP 2023 (Findings).

PAXQA Data

We release the PAXQA datasets on the HuggingFace Hub. The fields are consistent with the MLQA (and therefore SQuAD) fields.

The PAXQA test and validation sets are available at this link, and consists of 1788 QA examples total.

The PAXQA train sets are available at this link, and consists of 660K QA examples total. PAXQA_HWA are the 2 *gale* datasets, while PAXQA_AWA are the other 5 datasets.

Dataset sizes

Table 1 of the paper gives the number of QA examples for each split and each language:

You can verify the numbers with the files you downloaded above (contact the authors if there are inconsistencies).

Code

This section is forthcoming.

Citation

@article{li2023paxqa,
      title={\textsc{PaxQA}: Generating Cross-lingual Question Answering Examples at Training Scale}, 
      author={Bryan Li and Chris Callison-Burch},
      year={2023},
      journal={Findings of the Association for Computational Linguistics: EMNLP}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAXQA Datasets and Code

PAXQA Data

Dataset sizes

Code

Citation

About

Releases

Packages

manestay/paxqa

Folders and files

Latest commit

History

Repository files navigation

PAXQA Datasets and Code

PAXQA Data

Dataset sizes

Code

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages