Big homework for Information Retrieval.
- Python: 3.7
- Flask: 1.1.1
- Elasticsearch: 7.1.0
- Elasticsearch-full: 7.5
- Start elasticsearch server
/usr/local/bin/elasticsearch
- Edit config
edit config.py
maps # index setting
files_to_handle # source file of corpus. Be preprocessed by thulac
- Build index
python3 build_index.py
- Run application
After build index, run application
WARN: Use the same index name in app.py
and build_index.py
python3 app.py
Then open http://localhost:5000 in your browser. You can see following page.
-
(Optional) Build word vector
build word vector to optimize result
- Use
word2vec.py
to get word vector
python3 word2vec.py --iter: 迭代轮数,int --max_doc:使用多少句子,int OR None --mode:train/count/test中的一个,str --model:存放词向量的路径,str --doc_print:每隔多少个句子输出一次,int --min_cnt:训练词向量时的最小词频,int --workers:多线程中使用多少个线程,int --step_store:每隔多少步存储一次词向量,int
- Use