Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you have plan to support Thai language ? #836

Open
guftgift opened this issue Jul 25, 2019 · 4 comments
Open

Do you have plan to support Thai language ? #836

guftgift opened this issue Jul 25, 2019 · 4 comments

Comments

@guftgift
Copy link

Dear Sir,
Do you have Thai language support in your road map ?

@adrienball
Copy link
Contributor

Hi @guftgift ,
I'm afraid it is not planned for now.

@guftgift
Copy link
Author

guftgift commented Jul 25, 2019 via email

@adrienball
Copy link
Contributor

@guftgift
It seems that the Thai language has no separator between words (correct me if I'm wrong), and snips-nlu is not meant to be used directly on inputs that are not whitespace-separated. This means that in order to work with Thai, you will first have to find a way to tokenize your data before using it with snips-nlu. This is true both for training and inference.
Apparently, a romanization of the Thai language exists, that would be something to investigate (Google translate gives an transliterated form with whitespaces).

Once you've managed to handle this, you can try to set the language of your dataset to 'en' (english) and use the english default config to train your nlu engine after having made the following changes to the "feature_factory_configs" attribute of the config:

  • remove the word_cluster feature factory
  • set use_stemming to False everywhere
  • replace top_10000_words_stemmed by None everywhere

I can't guarantee that this will work well, but that is probably worth a try.
Note that, builtin entities will only work for values valid in english.

I hope this helps.

@guftgift
Copy link
Author

guftgift commented Jul 26, 2019 via email

@adrienball adrienball reopened this Jul 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants