Do you have plan to support Thai language ? #836

guftgift · 2019-07-25T05:32:25Z

Dear Sir,
Do you have Thai language support in your road map ?

adrienball · 2019-07-25T08:22:48Z

Hi @guftgift ,
I'm afraid it is not planned for now.

guftgift · 2019-07-25T09:33:44Z

Any suggestions to use snips with Thai language.

On Thu, 25 Jul 2019 at 15:23 Adrien Ball ***@***.***> wrote: Closed #836 <#836>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#836?email_source=notifications&email_token=AE27HS6WK6IXG7L4SH5BQV3QBFPHLA5CNFSM4IGXBKF2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOSWD4JUA#event-2508702928>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE27HS7QGWACNHFZOCSU2NDQBFPHLANCNFSM4IGXBKFQ> .

-- ============= Best Regards, Lalida Boonmana Guru Services Co., Ltd. [email protected] 66-81-402-8787

adrienball · 2019-07-25T13:30:36Z

@guftgift
It seems that the Thai language has no separator between words (correct me if I'm wrong), and snips-nlu is not meant to be used directly on inputs that are not whitespace-separated. This means that in order to work with Thai, you will first have to find a way to tokenize your data before using it with snips-nlu. This is true both for training and inference.
Apparently, a romanization of the Thai language exists, that would be something to investigate (Google translate gives an transliterated form with whitespaces).

Once you've managed to handle this, you can try to set the language of your dataset to 'en' (english) and use the english default config to train your nlu engine after having made the following changes to the "feature_factory_configs" attribute of the config:

remove the word_cluster feature factory
set use_stemming to False everywhere
replace top_10000_words_stemmed by None everywhere

I can't guarantee that this will work well, but that is probably worth a try.
Note that, builtin entities will only work for values valid in english.

I hope this helps.

guftgift · 2019-07-26T00:31:26Z

Dear Adrien, Thai language doesn't have separator but we can tokenize it. We have some open source projects for Thai tokenization such as https://github.com/PyThaiNLP/pythainlp. Thank you for your information I will try to follow your instruction. If you have plan to support Thai I will appreciate it so much. And I will try to contribute it as much I can.

…

Sent from my iPad

On 25 Jul 2019, at 20:30, Adrien Ball ***@***.***> wrote: @guftgift It seems that the Thai language has no separator between words (correct me if I'm wrong), and snips-nlu is not meant to be used directly on inputs that are not whitespace-separated. This means that in order to work with Thai, you will first have to find a way to tokenize your data before using it with snips-nlu. This is true both for training and inference. Apparently, a romanization of the Thai language exists, that would be something to investigate (Google translate gives an transliterated form with whitespaces). Once you've managed to handle this, you can try to set the language of your dataset to 'en' (english) and use the english default config to train your nlu engine after having made the following changes to the "feature_factory_configs" attribute of the config: remove the word_cluster feature factory set use_stemming to False everywhere replace top_10000_words_stemmed by None everywhere I can't guarantee that this will work well, but that is probably worth a try. Note that, builtin entities will only work for values valid in english. I hope this helps. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

adrienball closed this as completed Jul 25, 2019

adrienball reopened this Jul 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do you have plan to support Thai language ? #836

Do you have plan to support Thai language ? #836

guftgift commented Jul 25, 2019

adrienball commented Jul 25, 2019

guftgift commented Jul 25, 2019 via email

adrienball commented Jul 25, 2019

guftgift commented Jul 26, 2019 via email

Do you have plan to support Thai language ? #836

Do you have plan to support Thai language ? #836

Comments

guftgift commented Jul 25, 2019

adrienball commented Jul 25, 2019

guftgift commented Jul 25, 2019 via email

adrienball commented Jul 25, 2019

guftgift commented Jul 26, 2019 via email