A. Data Extraction
B. Data Preprocessing
C. Data Exploration and Visualzation
D. Data Preparation
E. Data Modeling
F. Dashboard
G. Testing
H. Travis CI integration
- extract_data_frame.py: extracts the data from data/covid19.json and construct a dataframe called processed_tweet_data.csv and saves it into the root dir
-
notebooks/dataPreProcessing.ipynb:
A. Cleaning
- cleans the processed_tweet_data.csv and saves the cleaned dataframe in a file called cleaned_tweet_data.csv - imports clean_tweet_dataframe.py and uses its method to clean the dataframeB. Exploration
- Data exploration is also done inside this note book
1 notebooks/modelGeneration.ipynb
A. Sentiment Analysis: Using the cleaned_tweet_data.csv, Data is prepared for sentiment analysis and Sentiment Analysis model is implemented using SGD classifier.
B. Topic Modeling: Using the cleaned_tweet_data.csv, Data is prepared for Topic Modeling and Topic Modeling model is implemented using Latent Dirichlet Allocation
-
add_data.py:
Connects to a database, creates tweets db, creates TweetInformation table and inserts a dataframe from cleaned_tweet_data.csv.
***Note: Replace your db username and db password inside this file in DBConnect method ***
-
schema.sql: A schema Describing A TweetInformation Table
-
dashboard.py: A dashoard is implemeted using streamlit. The dashboard has two pages for Data Visualzation.
***Note: dashboard.py imports tweeter_data_explorator.py which has several helpers method to explore the data ***
-
test_clean_tweets_dataframe.py: unit test for clean_tweets_dataframe.py
-
test_extract_dataframe.py: unit test for extract_dataframe.py
- .travis.yml: config file for travis CI automation