A machine learning pipeline to predict a person’s employment category (workclass) using U.S. Census data.
This project focuses on predicting an individual's workclass (employment category) using data from the 1994 U.S. Census Income dataset. It applies a complete machine learning workflow including:
-Data cleaning and preprocessing (handling missing values, outliers, and encoding categorical variables)
-Exploratory data analysis (EDA) to understand feature relationships and distributions
-Feature engineering and winsorization
-Model selection and training using Decision Trees, Random Forest, and Gradient Boosted Decision Trees
-Hyperparameter tuning via GridSearchCV
-Evaluation with accuracy, confusion matrices, and feature importance
-The final model achieves approximately 80% accuracy and identifies the most predictive features influencing employment type. This could help government agencies or labor economists better understand workforce patterns.