Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset for train the system #1

Open
quandb opened this issue Nov 13, 2017 · 0 comments
Open

Dataset for train the system #1

quandb opened this issue Nov 13, 2017 · 0 comments

Comments

@quandb
Copy link

quandb commented Nov 13, 2017

Hi,
I'm trying to update the word2vec and doc2vec models with the newer version from wiki.
I see there are several kind of wiki data such as: pages-meta-history*, pages-meta-current*, pages-articels*.
I was wondering that which kind of data you use for build models?
Because when I run then word2vec_phrases.py, I just got the size of output file ~12M (compare with your 62M).

Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant