Malaya is a Natural-Language-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.
Proper documentation is available at https://malaya.readthedocs.io/
CPU version
$ pip install malaya
GPU version
$ pip install malaya-gpu
Only Python 3.6.x and above and Tensorflow 1.10 and above but not 2.0 are supported.
Emotion Analysis
From transfer-learning BERT-Bahasa, XLNET-Bahasa and ALBERT-Bahasa to build deep emotion analysis models.
Entities Recognition
From transfer-learning BERT-Bahasa, XLNET-Bahasa and ALBERT-Bahasa to do Naming Entity Recognition.
Language Detection
using Fast-text and Sparse Deep learning Model to classify Malay (formal and social media), Indonesia (formal and social media), Rojak language and Manglish.
Normalizer
using local Malaysia NLP researches hybrid with Transformer models to normalize any bahasa texts.
Num2Word
Convert from numbers to cardinal or ordinal representation.
Part-of-Speech Recognition
From transfer-learning BERT-Bahasa, XLNET-Bahasa and ALBERT-Bahasa to do Part-of-Speech Recognition.
Dependency Parsing
From transfer-learning BERT-Bahasa, XLNET-Bahasa and ALBERT-Bahasa to do Dependency Parsing.
Relevancy Analysis
From transfer-learning BERT-Bahasa, XLNET-Bahasa and ALBERT-Bahasa to build deep relevancy analysis models.
Sentiment Analysis
From transfer-learning BERT-Bahasa, XLNET-Bahasa and ALBERT-Bahasa to build deep sentiment analysis models.
Spell Correction
Using local Malaysia NLP researches hybrid with Transformer models to auto-correct any bahasa words.
Stemmer
Use Character LSTM Seq2Seq with attention state-of-art to do Bahasa stemming.
Subjectivity Analysis
From transfer-learning BERT-Bahasa, XLNET-Bahasa and ALBERT-Bahasa to build deep subjectivity analysis models.
Similarity
Use deep Encoder, Doc2Vec, BERT, XLNET and ALBERT to build deep semantic similarity models.
Summarization
Using BERT, XLNET, ALBERT, skip-thought, LDA, LSA and Doc2Vec to give precise unsupervised summarization, and TextRank as scoring algorithm.
Topic Modelling
Provide Attention, LDA2Vec, LDA, NMF and LSA interface for easy topic modelling with topics visualization.
Toxicity Analysis
From transfer-learning BERT-Bahasa, XLNET-Bahasa and ALBERT-Bahasa to build deep toxicity analysis models.
Word2Vec
Provide pretrained bahasa wikipedia and bahasa news Word2Vec, with easy interface and visualization.
Transformer
Provide easy interface to load BERT-Bahasa, XLNET-Bahasa, ALBERT-Bahasa and ALXNET-Bahasa.
If you use our software for research, please cite:
@misc{Malaya, Natural-Language-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow, author = {Husein, Zolkepli}, title = {Malaya}, year = {2018}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/huseinzol05/malaya}} }
Thanks to Im Big, LigBlou, Mesolitica and KeyReply for sponsoring AWS Google and private cloud to train Malaya models.
Thank you for contributing this library, really helps a lot. Feel free to contact me to suggest me anything or want to contribute other kind of forms, we accept everything, not just code!