Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

how to set vocabulary (from another model) from a config file allennlp #4374

Closed
semerekiros opened this issue Jun 18, 2020 · 5 comments
Closed

Comments

@semerekiros
Copy link

semerekiros commented Jun 18, 2020

Hi I am new to allennlp. Instead of building my Vocabulary from instances which is the default implementation, I would like to build it from an external file that contains a list of words. What do I need to change in the config file?

@semerekiros semerekiros changed the title how to set vocabulary (from another model) from a config file how to set vocabulary (from another model) from a config file allennlp Jun 18, 2020
@elkotito
Copy link

I don't know whether it's possible to build from an external file, but if the words are unique, then you can read the external vocabulary using Vocabulary.from_files. If you use config files, then you need to add something like:

  vocabulary: {
    type: "from_files",
    directory: "/some/dir/",   
    oov_token: "[UNK]",  # Whatever is your UNK token
    padding_token: "[PAD]"  # Be careful here though, because if you already have PAD token in your vocabulary, then you don't need it (it will add an additional token to your vocabulary)
  },

@semerekiros
Copy link
Author

semerekiros commented Jun 18, 2020

Thank you for your reply @mateuszpieniak . I actually have a list of word-freq pairs in a text file. Isn't the from files method used to loads a Vocabulary that was serialized using save_to_files ?

upload

@elkotito
Copy link

elkotito commented Jun 18, 2020

@semerekiros Yes, it is, but I guess you could find a workaround:

  1. Make sure that each line contains only a token in your vocab.txt
  2. You will also need non_padded_namespaces.txt in the directory with a vocab, If you already have padding token in your dictionary, then you have to fill this file with vocab value.

Let me know whether it works.

It's a bit dirty. As far as I remember this discussion relates to the following topic, so you can see there is some work ongoing - #3097

@semerekiros
Copy link
Author

semerekiros commented Jun 18, 2020

@mateuszpieniak . The following adjustment in the config file did the job for me.

"vocabulary": {
"directory_path": "path/to/directory",

}

Setting the "type" : "from_files" or "from_instances" raised an error. But the above snippet seems to work. Thank you for your help.

@johntiger1
Copy link

Is it possible to do this from a single file?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants