This project is a data cleaning, wrangling, analysis, clustering, ML Modeling, and DL Modeling for a metabolic syndrome prediction
This project aims to help the medical persona understand the properties of metabolic syndrome that play crucial roles in predicting and further diagnosis a metabolic syndrome case.
dtypes: float64(5), int64(6), object(4), 2401 entries,(total 14 columns),memory usage: 281.5+ KB
- as we notice the higher the triglycerides levels the higher the metabolic syndrome affection.
- Also, we notice the higher the WaisCirc levels the higher the metabolic syndrome affection.
- This is a general distribution based on sex wither its a male or female.
- This is a general distribution based on marital stauts.
- This is a general distribution based on race.
- This is a general distribution based on age and sex.
Logistic Regression Model (Testing Set):
precision recall f1-score support
0 0.84 0.92 0.88 395
1 0.81 0.67 0.74 206
accuracy 0.83 601
Logistic Regression (SMOTE) Model (Testing Set):
precision recall f1-score support
0 0.88 0.84 0.86 395
1 0.72 0.78 0.75 206
accuracy 0.82 601
Logistic Regression (Gridsearch + SMOTE) Model (Testing Set):
precision recall f1-score support
0 0.88 0.85 0.86 395
1 0.73 0.78 0.75 206
accuracy 0.82 601
Logistic Regression (PCA) Model (Testing Set):
precision recall f1-score support
0 0.84 0.91 0.87 395
1 0.79 0.67 0.72 206
accuracy 0.83 601
Logistic Regression (K-Means) Model (Testing Set):
precision recall f1-score support
0 0.88 0.87 0.87 395
1 0.75 0.76 0.76 206
accuracy 0.83 601
Deep Learning Model (Testing Set):
precision recall f1-score support
0 0.85 0.91 0.88 159
1 0.79 0.70 0.74 82
accuracy 0.83 241
Deep Learning Model (Tunned) (Testing Set):
precision recall f1-score support
0 0.95 0.40 0.56 159
1 0.45 0.96 0.61 82
accuracy 0.59 241
Logistic Regression (Decision Threshold + Gridsearch + SMOTE) Model (Testing Set):
precision recall f1-score support
0 0.90 0.81 0.85 395
1 0.69 0.83 0.75 206
accuracy 0.82 601
The Final Model Chosen was the Logistic Regression (Decision Threshold + Gridsearch + SMOTE) Model, Using this model to make predictions about determination of metabolic syndrome would be reliable since the target we are trying to predict had the highest score percentage of accuracy for the test data which is 82%.