Language Model for Filipino (Tagalog) Language

This project contains relevant files for creating a language model for the Filipino language from the Filipino (Tagalog) Wikipedia corpus, Wiktionary, and Wikibooks.

Language model weights and the itos (index -> string) mapping pickle file is available for download here

The RNN was trained in 4 iterations with the following learning rates:

8
4
1.5
0.2

The learning rates were determined through the plot_lr() methods of fastai.

Performance:

Perplexity: 26.1997

Accuracy: 0.4403

TODO:

Host the language model weights and itos (index -> string vocab pickle file) for the community to reuse the model directly
Add gradient clipping
Include the text generation code snippet to visually inspect language model
Use the sentencepiece tokenizer
Finetune
Use continuous cache pointer (from here: https://github.com/salesforce/awd-lstm-lm)
Try QRNN (from here: https://github.com/salesforce/pytorch-qrnn/)
Identify new datasets for sentiment analysis.
Figure out if there are any summarization datasets out there.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
tagalog		tagalog
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ULMFiT Inference Test.ipynb		ULMFiT Inference Test.ipynb
ulmfit.ipynb		ulmfit.ipynb
wiki_tagalog_corpus.csv		wiki_tagalog_corpus.csv
wiki_tagalog_itos.pkl		wiki_tagalog_itos.pkl
wiki_tagalog_test.csv		wiki_tagalog_test.csv
wiki_tagalog_train.csv		wiki_tagalog_train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Model for Filipino (Tagalog) Language

Performance:

TODO:

About

Releases

Packages

Languages

License

hadrianpaulo/ULMFiT-Filipino-Tagalog

Folders and files

Latest commit

History

Repository files navigation

Language Model for Filipino (Tagalog) Language

Performance:

TODO:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages