You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.
This is not the case for all models in the lib and I did not find a unified get_vocab method in huggingface code.
Anyways, the tokens will still be correctly indexed. It is just that we will not have tokens in our vocabulary object with all the consequences.
In #3453 I will add support for RoBERTa and XLM that have encoder keys instead of vocab key, but I do not know about other models.
The text was updated successfully, but these errors were encountered:
Yes, I think they are separate. This one is about correctly getting BERT vocab into our Vocabulary object while #3097 is about saving the vocab to disk.
MaksymDel
changed the title
Pretrained vocabulary from transformers is sometimes not saved
Pretrained vocabulary from transformers is sometimes not saved in our Vocabulary object
Nov 17, 2019
The newest version of transformers now includes get_vocab method added to tokenizers, which can be used to retrieve all the vocabulary from the tokenizers. Using it should fix this issue.
We only add tokens from pretrained vocab from
transformers
lib if the tokenizer hasvocab
key inside.allennlp/allennlp/data/token_indexers/pretrained_transformer_indexer.py
Line 62 in 88fe007
This is not the case for all models in the lib and I did not find a unified
get_vocab
method in huggingface code.Anyways, the tokens will still be correctly indexed. It is just that we will not have tokens in our vocabulary object with all the consequences.
In #3453 I will add support for RoBERTa and XLM that have
encoder
keys instead ofvocab
key, but I do not know about other models.The text was updated successfully, but these errors were encountered: