Skip to content

daniEL2371/Twitter-Data-Analysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

148 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter-Data-Analysis

What is done

A. Data Extraction

B. Data Preprocessing

C. Data Exploration and Visualzation

D. Data Preparation

E. Data Modeling

F. Dashboard

G. Testing

H. Travis CI integration

Data Extraction

  1. extract_data_frame.py: extracts the data from data/covid19.json and construct a dataframe called processed_tweet_data.csv and saves it into the root dir

Data Preprocessing

  1. notebooks/dataPreProcessing.ipynb:

    A. Cleaning

     - cleans the processed_tweet_data.csv and saves the cleaned dataframe in a file called    
     cleaned_tweet_data.csv
     
     - imports clean_tweet_dataframe.py and uses its method to clean the dataframe
    

    B. Exploration

     - Data exploration is also done inside this note book
    

Data Preparation and Data Modeling

1 notebooks/modelGeneration.ipynb

A. Sentiment Analysis: Using the cleaned_tweet_data.csv, Data is prepared for sentiment analysis and Sentiment Analysis model is implemented using SGD classifier.

B. Topic Modeling: Using the cleaned_tweet_data.csv, Data is prepared for Topic Modeling and Topic Modeling model is implemented using Latent Dirichlet Allocation

Dashboard

  1. add_data.py:

     Connects to a database,
     creates tweets db,
     creates TweetInformation table and inserts a dataframe from cleaned_tweet_data.csv.
    

***Note: Replace your db username and db password inside this file in DBConnect method ***

  1. schema.sql: A schema Describing A TweetInformation Table

  2. dashboard.py: A dashoard is implemeted using streamlit. The dashboard has two pages for Data Visualzation.

    ***Note: dashboard.py imports tweeter_data_explorator.py which has several helpers method to explore the data ***

tests

  1. test_clean_tweets_dataframe.py: unit test for clean_tweets_dataframe.py

  2. test_extract_dataframe.py: unit test for extract_dataframe.py

CI automation

  1. .travis.yml: config file for travis CI automation

About

The current Coronavirus (COVID-19) pandemic has impacted and changed lives on a global scale since its emergence in late 2019. This work can be used to gain insight into how COVID-19 has affected African people’s livelihoods. Having that information can help governments to devise an effective prevention strategy to control COVID 19 in Africa. Th…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 97.0%
  • Python 3.0%