Skip to content

This project uses GaussianNB, Random Forest, and AdaBoost Classification Models to predict the income category of individuals with US Census Data

Notifications You must be signed in to change notification settings

kevinwood15/Python_ML_Classification_Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Python_Classification_Modeling

This project uses GaussianNB, Random Forest, and AdaBoost Classification Models to predict the income category of individuals with US Census Data

I pull in the raw data and prepare it by converting all categorical variables into numerical features and normalize all numerical features.

I split the data into training and test sets, estimate the naive predictor as a baseline, then estimate a GNB, RF, and AdaBoost classication models.

I then consider performance metrics, prediction/training time, and the algorithms suitability for the data for model selection.

Having chosen the AdaBoost as the best model (best accuracy and F-score with faster computation relative to RF), I tune the parameters for a better fit by varying the number of estimators and learning rates.

Lastly, I examine the features closely and estimate a reduced model using the 5 most important features. This simpler-reduced model has a final Accuracy of 0.86 and an F-score of 0.74.

About

This project uses GaussianNB, Random Forest, and AdaBoost Classification Models to predict the income category of individuals with US Census Data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published