Skip to content

Code for the paper "Adapting Monolingual Models: Data can be Scarce when Language Similarity is High"

Notifications You must be signed in to change notification settings

Bartelds/low-resource-adapt

 
 

Repository files navigation

Wietse de Vries • Martijn Bartelds • Malvina Nissim • Martijn Wieling

Adapting Monolingual Models: Data can be Scarce when Language Similarity is High

This repository contains everything that is needed to replicate the results in the paper:

📝 Adapting Monolingual Models: Data can be Scarce when Language Similarity is High

Models

The best fine-tuned models for Gronings and West Frisian are available on the HuggingFace model hub:

Lexical layers

These models are identical to BERTje, but with different lexical layers (bert.embeddings.word_embeddings).

POS tagging

These models share the same fine-tuned Transformer layers + classification head, but with the retrained lexical layers from the models above.

Development

Conda/mamba dependencies are listed in environment.yml. This repository contains all scripts and configs that are needed to replicate the results in the paper. A more extensive usage guide will be provided later.

BibTeX entry

The paper is to appear in Findings of ACL2021. The preprint can be cited as:

@misc{devries2021adapting,
      title={{Adapting Monolingual Models: Data can be Scarce when Language Similarity is High}}, 
      author={Wietse de Vries and Martijn Bartelds and Malvina Nissim and Martijn Wieling},
      year={2021},
      eprint={2105.02855},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Code for the paper "Adapting Monolingual Models: Data can be Scarce when Language Similarity is High"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 91.4%
  • Shell 8.6%