Analysis of Measures taken by different countries to control the spread of Covid-19 Virus

This is a research project on the dataset which contains data of all the measures taken by different countries to control the Covid-19 Pandemic

Team Name: Team RPM

Team Members:

Nisha Ramrakhyani
Zalak Panchal
Punit Mashruwala

Project Introduction: The project mainly focuses on the impact caused by different measures taken by different countries to control the spread of Covid-19. We have a dataset of around 22k records of measures taken by different countries along with the dates they started implementing these measures. We'll augment this dataset with the dataset of the number of Covid cases in each country each day. Our aim is to analyze the rise/fall in the number of Covid-19 cases before and after taking the necessary measures.

Research Question:

How did the different measures taken by different countries to control the Pandemic impact the total number of Covid-19 cases?

Relevant Domain Information:

Data Sources:

Approach:

Data understanding and EDA:

Introduction to the Dataset: We are working on a dataset that contains records of the measures taken to control the spread of the Novel Coronavirus which was later renamed to Covid-19.

We have chosen 16 countries('Australia', 'France', 'Germany', 'India', 'Italy', 'United States', 'New Zealand', 'Canada', 'Norway', 'Sweden', 'United Kingdom', 'Mexico', 'Singapore', 'Spain', 'Sri Lanka', 'Belgium') for our research.

Our Dataset contains following columns:

ID - Unique Number

ISO - iso3 code

COUNTRY - Name of the country

REGION - Continent

LOG_TYPE - Introduction / extension of measures, Phase-out of measures

CATEGORY - Measures Category

MEASURE - Various method or rules applied to control the spread of the virus

TARGETED_POP_GROUP - Yes or no in dropdown

COMMENTS - Comments or any others explanation for the measures

NON_COMPLIANCE - Additional rules to applied like fines etc.

DATE_IMPLEMENTED - Start of the measures in the country or part of the country

SOURCE - Data source for the mentioned Measures

SOURCE_TYPE - Data source type for the mentioned Measures

LINK - Link for the given information or data

ENTRY_DATE - Entry date of the given measure in the dataset

We used another dataset for modeling to develop predictive model for forecasting the number of Covid cases across the world and to perform exploratory analysis of number of Covid cases across the world and no. of deaths across the world.

Below are the details of this dataset: https://ourworldindata.org/coronavirus-testing

For more detail, please check this readme file:
https://www.acaps.org/sites/acaps/files/key-documents/files/acaps_-_covid-19_government_measures_dataset_readme.pdf

EDA:

We have done EDA on our covid-19 dataset to analyze how different countries prepared themselves to fight the virus, after performing the initial data preparation.

More information is available in the notebook in the folder with the name eda_code or in the pdf file with the name: EDA - Jupyter Notebook.pdf

Few Examples of EDA

Data Preparation:

We have added four new columns 'covid_cases_per_date' , 'population' ,Total_Covid_cases' and *Total_no_of_tests' to the given dataset. The intention of adding these columns was to see the impact of different measures taken by different countries to control the spread of the Pandemic. We got this data from Wikipedia

From the 18 columns in the dataset, we chose the following columns that seemed important to work on:

COUNTRY

CATEGORY

MEASURE

COMMENTS

DATE_IMPLEMENTED

covid_case_per_date

population

Total_Covid_cases

Total_no_of_tests

We did the log normalization on the covid_cases_per_date column to normalize the column as it had a lot of variances.

We also calculated number of COVID cases per millions population.

Converted datatype of 'date implementation' column from OBJECT to DATETIME

We handled the missing data. We had around 60 records with missing values. And since the column 'DATE IMPLEMENTATION' had missing values, we dropped the records with missing values. It would have been incorrect to compute these missing values.

We calculated the number of days from DATE to plot the number of Covid cases against number of days.

This way the dataset was prepared for EDA and modeling

Machine Learning :

Our dataset contains time-series data.

We have built two models for our dataset. - We have built predictive model using the time series analysis and forecasting tool from Facebook called Prophet with Python to perform Exploratory analysis of percent positive COVID cases in Australia. - We have built a model to forecast the number of COVID cases across the world.

In the first predictive model, we have forecasted the Rate of Positivity taking into account Interventions in Australia. For example, Australia took the measure "Border closure" on 03/20/2020.So we have considered such intervention to forecast the rate of positivity.

We have used the Prophet class and performed cross-validation on it. After calculating metrics such as MAE, MSE, or RMSE with built-in functions, we have plotted the estimated errors across the horizon.

More information is available in the notebook in the folder with the name eda_code or in the pdf file with the name: [EDA - Jupyter Notebook.pdf]

Coronavirus spread has conducted the society under the edge of loss in social lives. Additionally, it is crucial to investigate the transmission growth ahead and predict the future occurrences of the transmission. So, In the second prediction model, we have forecasted the number of COVID cases across the world by building SVM model and Polynomial regression model.

The metrics used for SVR and Polynomial regression model are mean absolute error and mean squared error: SVR: MAE: 20024573.832694203 MSE: 586909261681649.8 Linear Regression: MAE: 760076.3281707081 MSE: 1467099793394.4988

Evaluation:

In conclusion, the method of Polynomial Regression (PR) generated a minimum Root Mean Square Error (RMSE) amount over other methods in projecting the COVID-19 transmission.

Known Issues (Problems with Data Preparation and Prediction)

We did not have enough data to work on our research, so we augmented two datasets.

The data for different countries was inconsistent.

We had to deal with lot of missing data.

Conclusion

This project gives insight about how different measures taken by different countries impacted the growth rate of COVID cases.

We learned how to deal with time-series data and we were able to build predictive model to forecast the number of COVID Case across the world.

Future Work

We can perform association rule analysis on this dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of Measures taken by different countries to control the spread of Covid-19 Virus

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
eda_code		eda_code
model		model
.gitignore		.gitignore
EDA - Jupyter Notebook.pdf		EDA - Jupyter Notebook.pdf
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Analysis of Measures taken by different countries to control the spread of Covid-19 Virus

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages