Skip to content

ariansbahram/eCornell-census-workclass-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

eCornell-census-workclass-model

A machine learning pipeline to predict a person’s employment category (workclass) using U.S. Census data.

This project focuses on predicting an individual's workclass (employment category) using data from the 1994 U.S. Census Income dataset. It applies a complete machine learning workflow including:

-Data cleaning and preprocessing (handling missing values, outliers, and encoding categorical variables)

-Exploratory data analysis (EDA) to understand feature relationships and distributions

-Feature engineering and winsorization

-Model selection and training using Decision Trees, Random Forest, and Gradient Boosted Decision Trees

-Hyperparameter tuning via GridSearchCV

-Evaluation with accuracy, confusion matrices, and feature importance

-The final model achieves approximately 80% accuracy and identifies the most predictive features influencing employment type. This could help government agencies or labor economists better understand workforce patterns.

About

A machine learning pipeline to predict a person’s employment category (workclass) using U.S. Census data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published