Dataset for train the system #1

quandb · 2017-11-13T09:17:05Z

Hi,
I'm trying to update the word2vec and doc2vec models with the newer version from wiki.
I see there are several kind of wiki data such as: pages-meta-history*, pages-meta-current*, pages-articels*.
I was wondering that which kind of data you use for build models?
Because when I run then word2vec_phrases.py, I just got the size of output file ~12M (compare with your 62M).

Thanks,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset for train the system #1

Dataset for train the system #1

quandb commented Nov 13, 2017

Dataset for train the system #1

Dataset for train the system #1

Comments

quandb commented Nov 13, 2017