calamanCy models for Tagalog NLP
Model collection for https://github.com/ljvmiranda921/calamanCy. You can find more information in each model (or dataset) card.
Token Classification • Updated • 11Note Transformer-based pipeline using mDeBERTa-v3 (base)
ljvmiranda921/tl_calamancy_lg
Token Classification • Updated • 13Note Latest large-sized pipeline based on Tagalog fastText vectors (714k unique vectors, 300 dimensions, Size: 1.4 GB)
ljvmiranda921/tl_calamancy_md
Token Classification • Updated • 189Note Latest medium-sized pipeline based on floret (200k unique vectors, 200 dimensions, Size: 400 MB)
ljvmiranda921/tl_calamancy_trf-0.1.0
Token Classification • Updated • 19 • 5Note LEGACY: Transformer-based pipeline using RoBERTa-Tagalog
ljvmiranda921/tl_calamancy_lg-0.1.0
Token Classification • Updated • 10 • 1Note LEGACY: Large-sized pipeline based on fastText (714k unique vectors, 300 dimensions, Size: 455 MB)
ljvmiranda921/tl_calamancy_md-0.1.0
Token Classification • Updated • 172Note LEGACY: Medium-sized pipeline based on floret (50k unique vectors, 200 dimensions, Size: 77 MB)
ljvmiranda921/tlunified-ner
Viewer • Updated • 7.82k • 194 • 3Note Gold-standard Tagalog NER dataset. Cohen's kappa = 0.81
Developing a Named Entity Recognition Dataset for Tagalog
Paper • 2311.07161 • Published • 2calamanCy: A Tagalog Natural Language Processing Toolkit
Paper • 2311.07171 • Publishedljvmiranda921/tl_gliner_small
Token Classification • Updated • 2ljvmiranda921/tl_gliner_medium
Token Classification • Updated • 2ljvmiranda921/tl_gliner_large
Token Classification • Updated