-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How does the model handle the OOV problem? #11
Comments
I am also looking for the same. Were you able to find a solution for it? Let's say I have a vocabulary of => ["hello", "I", "am", "akanimax"] and my source statement is => <"akanimax", "is", "a", "good", "boy"> and my target statement is => <"akanimax", "not", "a", "good", "boy">. 1.) When the input to the Encoder is "a" or "is" or "good" or "boy", what is actually sent to the Encoder RNN? Is it the same embedding representing <copy> token or are they different randomly initialized embeddings? 2.) When "not" needs to be output, we have no other option than calling it UNK because it is not in I would be highly grateful if you could help. Best regards, |
Hi, @akanimax
I hope I can help you. My English is not very well, forget it hh. |
I think the OOV problem can be solved by CopyNet here. Although in real situation, we are probably unable to collect all tokens |
OOV means the out of vocalbary word.
I can't find any code to handle the problem, maybe I miss some important steps?
Looking forward to your advice or answers.
The text was updated successfully, but these errors were encountered: