Dimension Reduction

This project contains 3 parts:

data pre-processing and dimensionality reduction
load data to sqlite database and retrive them with flask server
visualization

Data preprocessing and dimension reduction

use cd .\dimension_reduction\ first to navigate to this folder
To use the python scripts provided to you make sure you have Python >=3.7 installed.
Other than the environment setup you provided, install the following packages for usage:

pip install matplotlib
pip install nltk
pip install scikit-learn

To load the data to the database, just run the script data_to_sqlite.ipydb. Remember using runAll.
To initialize the flask server, use python .\server.py

data-processing_and_dimension_reduction.ipynb and paper.xlsx

Data preprocessing script using 2 dimension reduction methods PCA and t-SNE and the raw data.
It contains all the preprocessing steps of 2 columns 'abstract' and 'AuthorName-Deduped' with 2 dimension reduction medthod PCA and t-SNE. Furthermore, it contains an Implementation of PCA by hand.

TSNE_from_scratch.py

It contains the script of an Implementation of t-SNE by hand.

Why seperated from PCA?
Running time is too long to put together in the jupyter notebook. Detail information in the last part of data-processing_and_dimension_reduction.ipynb.

data_to_sqlite.ipydb and data.db

The script that creates the database and the database.

server.py

The initiation script of the flask server. It reads the data from the database created.

folder data_for_vis

It contains the .csv files generated for visualization.

Visualization

use cd .\visualization\ first to navigate to this folder.
run npm install and npm run dev to start it.

The visualization allows user to choose 3 parameters to see the output graph the user wants to see.

dr method

Which dimension reduction method the user wants to use.

column

Which column user wants to analyze.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dimension Reduction

Data preprocessing and dimension reduction

data-processing_and_dimension_reduction.ipynb and paper.xlsx

TSNE_from_scratch.py

data_to_sqlite.ipydb and data.db

server.py

folder data_for_vis

Visualization

category

dr method

column

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dimension_reduction		dimension_reduction
visualization		visualization
README.md		README.md

tongwenbo/dimension_reduction_text_data

Folders and files

Latest commit

History

Repository files navigation

Dimension Reduction

Data preprocessing and dimension reduction

data-processing_and_dimension_reduction.ipynb and paper.xlsx

TSNE_from_scratch.py

data_to_sqlite.ipydb and data.db

server.py

folder data_for_vis

Visualization

category

dr method

column

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages