-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do you have plan to support Thai language ? #836
Comments
Hi @guftgift , |
Any suggestions to use snips with Thai language.
On Thu, 25 Jul 2019 at 15:23 Adrien Ball ***@***.***> wrote:
Closed #836 <#836>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#836?email_source=notifications&email_token=AE27HS6WK6IXG7L4SH5BQV3QBFPHLA5CNFSM4IGXBKF2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOSWD4JUA#event-2508702928>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE27HS7QGWACNHFZOCSU2NDQBFPHLANCNFSM4IGXBKFQ>
.
--
=============
Best Regards,
Lalida Boonmana
Guru Services Co., Ltd.
[email protected]
66-81-402-8787
|
@guftgift Once you've managed to handle this, you can try to set the language of your dataset to
I can't guarantee that this will work well, but that is probably worth a try. I hope this helps. |
Dear Adrien,
Thai language doesn't have separator but we can tokenize it. We have some open source projects for Thai tokenization such as https://github.com/PyThaiNLP/pythainlp.
Thank you for your information I will try to follow your instruction. If you have plan to support Thai I will appreciate it so much. And I will try to contribute it as much I can.
…Sent from my iPad
On 25 Jul 2019, at 20:30, Adrien Ball ***@***.***> wrote:
@guftgift
It seems that the Thai language has no separator between words (correct me if I'm wrong), and snips-nlu is not meant to be used directly on inputs that are not whitespace-separated. This means that in order to work with Thai, you will first have to find a way to tokenize your data before using it with snips-nlu. This is true both for training and inference.
Apparently, a romanization of the Thai language exists, that would be something to investigate (Google translate gives an transliterated form with whitespaces).
Once you've managed to handle this, you can try to set the language of your dataset to 'en' (english) and use the english default config to train your nlu engine after having made the following changes to the "feature_factory_configs" attribute of the config:
remove the word_cluster feature factory
set use_stemming to False everywhere
replace top_10000_words_stemmed by None everywhere
I can't guarantee that this will work well, but that is probably worth a try.
Note that, builtin entities will only work for values valid in english.
I hope this helps.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Dear Sir,
Do you have Thai language support in your road map ?
The text was updated successfully, but these errors were encountered: