How does the model handle the OOV problem? #11

VieZhong · 2018-11-13T06:21:40Z

OOV means the out of vocalbary word.

I can't find any code to handle the problem, maybe I miss some important steps?

Looking forward to your advice or answers.

akanimax · 2019-03-08T11:12:24Z

I am also looking for the same. Were you able to find a solution for it?
I'll explain my problem a bit formally:

Let's say I have a vocabulary of => ["hello", "I", "am", "akanimax"] and my source statement is => <"akanimax", "is", "a", "good", "boy"> and my target statement is => <"akanimax", "not", "a", "good", "boy">.
Then, while decoding the "not" in the target, following are the two questions:

1.) When the input to the Encoder is "a" or "is" or "good" or "boy", what is actually sent to the Encoder RNN? Is it the same embedding representing <copy> token or are they different randomly initialized embeddings?

2.) When "not" needs to be output, we have no other option than calling it UNK because it is not in chi nor in V. Is this correct?

I would be highly grateful if you could help.

Best regards,
@akanimax

VieZhong · 2019-03-08T11:36:01Z

Hi, @akanimax
I can't solve the OOV problem, either.
My answer about your two questions may be that:

The words that model doesn't recognize will be noted as the same embedding token.
Yes, it is.

I hope I can help you. My English is not very well, forget it hh.

nlp4whp · 2019-06-06T07:15:12Z

Hi, @akanimax
I can't solve the OOV problem, either.
My answer about your two questions may be that:

The words that model doesn't recognize will be noted as the same embedding token.

Yes, it is.

I hope I can help you. My English is not very well, forget it hh.

Hi, @akanimax, @VieZhong

I think the OOV problem can be solved by CopyNet here.
You see, the size of vocabulary (gen_vocab_size) for generate could be small,
And another larger vocabulary including "OOV" for copy can be changed.

Although in real situation, we are probably unable to collect all tokens

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does the model handle the OOV problem? #11

How does the model handle the OOV problem? #11

VieZhong commented Nov 13, 2018 •

edited

Loading

akanimax commented Mar 8, 2019

VieZhong commented Mar 8, 2019 •

edited

Loading

nlp4whp commented Jun 6, 2019 •

edited

Loading

How does the model handle the OOV problem? #11

How does the model handle the OOV problem? #11

Comments

VieZhong commented Nov 13, 2018 • edited Loading

akanimax commented Mar 8, 2019

VieZhong commented Mar 8, 2019 • edited Loading

nlp4whp commented Jun 6, 2019 • edited Loading

VieZhong commented Nov 13, 2018 •

edited

Loading

VieZhong commented Mar 8, 2019 •

edited

Loading

nlp4whp commented Jun 6, 2019 •

edited

Loading