GitHub - jeromemassot/nlp: Simple experiments with word embeddings

What is this repo?

This code lets you experiment with pre-trained word embeddings using plain Python 3 with no additional dependencies.

See the blogpost Playing with word vectors for a detailed explanation.

Usage

Once you clone this repo, you can simply run:

$ python3 main.py

Try to edit the code to explore the word embeddings.

This repo only includes a small data file with 1000 words. To get interesting results you'll need to download the pre-trained word vectors from the fastText website.

But don't use the whole 2GB file! The program would use too much memory. Instead, once you download the file take only the top n words, save them to a separate file, and remove the first line. For example:

$ cat data/wiki-news-300d-1M.vec | head -n 50001 | tail -n 50000 > data/vectors.vec

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
.gitignore		.gitignore
README.md		README.md
load.py		load.py
main.py		main.py
vectors.py		vectors.py
word.py		word.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is this repo?

Usage

About

Releases

Packages

Languages

jeromemassot/nlp

Folders and files

Latest commit

History

Repository files navigation

What is this repo?

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages