This project contains 3 parts:
- data pre-processing and dimensionality reduction
- load data to sqlite database and retrive them with flask server
- visualization
use cd .\dimension_reduction\
first to navigate to this folder
To use the python scripts provided to you make sure you have Python >=3.7 installed.
Other than the environment setup you provided, install the following packages for usage:
pip install matplotlib
pip install nltk
pip install scikit-learn
To load the data to the database, just run the script data_to_sqlite.ipydb. Remember using runAll.
To initialize the flask server, use python .\server.py
Data preprocessing script using 2 dimension reduction methods PCA and t-SNE and the raw data.
It contains all the preprocessing steps of 2 columns 'abstract' and 'AuthorName-Deduped' with 2 dimension reduction medthod PCA and t-SNE. Furthermore, it contains an Implementation of PCA by hand.
It contains the script of an Implementation of t-SNE by hand.
- Why seperated from PCA?
- Running time is too long to put together in the jupyter notebook. Detail information in the last part of data-processing_and_dimension_reduction.ipynb.
The script that creates the database and the database.
The initiation script of the flask server. It reads the data from the database created.
It contains the .csv files generated for visualization.
- use
cd .\visualization\
first to navigate to this folder. - run
npm install
andnpm run dev
to start it.
The visualization allows user to choose 3 parameters to see the output graph the user wants to see.
Which category can be used to define a "good paper".
Which dimension reduction method the user wants to use.
Which column user wants to analyze.