You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.
so when using any pre-trained token embedder where OOV and unknown tokens are defined differently (e.g., OOV is [UNK] instead of @@UNKNOWN@@) it breaks.
I would add padding_token and oov_token as parameters of the Vocabulary class inizialiser but I am not sure if this will add inconsistencies.
The text was updated successfully, but these errors were encountered:
I think this would be ok. It might also help with #3097. PR welcome! Just be sure that it doesn't break any tests, and that you add sufficient tests for the new functionality.
Default OOV and unknown tokens are hardcoded as default in the vocabulary class:
allennlp/allennlp/data/vocabulary.py
Line 226 in 9a6962f
and
allennlp/allennlp/data/vocabulary.py
Line 227 in 9a6962f
so when using any pre-trained token embedder where OOV and unknown tokens are defined differently (e.g., OOV is
[UNK]
instead of@@UNKNOWN@@
) it breaks.I would add
padding_token
andoov_token
as parameters of theVocabulary
class inizialiser but I am not sure if this will add inconsistencies.The text was updated successfully, but these errors were encountered: