cd into the WikiClassify2.0 base directory and run python main.py to start the automated workflow in the following order:
- Download latest Wikipedia data dump bz2 archive
- Extract archive into .xml format
- Compile C++ parser files
- Parse .xml data, sending bursts to remove server at 1000 article increments
- Train word2vec and LDA models
After a model is present in the working directory, a subsequent call to python main.py will open the interface created to interact with the models (including A* path search and A* convene functions). A call to python main.py with a -c launch parameter will clean the working directory of models and downloaded data.
Run python main.py with a -g launch parameter to open the user interface main menu.
Configure parser launch parameters.
Python 2.7numpy(pip install numpy)g++gensim(pip install gensim)sklearn(pip install sklearn))PyQt4(apt-get install python-qt4)psycopg2(pip install psycopg2)
Both of the following packages can be install via the command line using package manager such as apt-get on Ubuntu.
libpq-dev(apt-get install libpq-dev)libpqxx-dev(apt-get install libpqxx-dev)
