Python_Classification_Modeling

This project uses GaussianNB, Random Forest, and AdaBoost Classification Models to predict the income category of individuals with US Census Data

I pull in the raw data and prepare it by converting all categorical variables into numerical features and normalize all numerical features.

I split the data into training and test sets, estimate the naive predictor as a baseline, then estimate a GNB, RF, and AdaBoost classication models.

I then consider performance metrics, prediction/training time, and the algorithms suitability for the data for model selection.

Having chosen the AdaBoost as the best model (best accuracy and F-score with faster computation relative to RF), I tune the parameters for a better fit by varying the number of estimators and learning rates.

Lastly, I examine the features closely and estimate a reduced model using the 5 most important features. This simpler-reduced model has a final Accuracy of 0.86 and an F-score of 0.74.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
Udacity_ML_Classification_Model_Notebook_KWood.ipynb		Udacity_ML_Classification_Model_Notebook_KWood.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python_Classification_Modeling

About

Uh oh!

Releases

Packages

Uh oh!

Languages

kevinwood15/Python_ML_Classification_Modeling

Folders and files

Latest commit

History

Repository files navigation

Python_Classification_Modeling

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages