Skip to content

Using data provided by the census bureau database, this model can predict if an individual makes over $50k a year with about 83% accuracy

License

Notifications You must be signed in to change notification settings

Someone45/Income-Classification-ML-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Income Classification ML Model

This model can determine if an individual earns over or under $50k/year.

Installation

Use the package manager pip to install requirements.txt.

pip install -r requirements.txt

Algorithm

After cleansing the data from missing values, outliers, and data that cluttered or were unrelated to the final outcome value, I decided to settle on using logistic regression in order to find a suitable model that could predict the income of an individual.

With about 16k data points to train the model on, I was able to achieve about 83% accuracy with a margin of 1.5% (so about 15.5 to 18.5 error). This is comparable with different models on this data set as follows:

VIEW FILE AS RAW FOR TABLE TO LOAD PROPERLY

| Algorithm Error | -- ---------------- ----- | 1 C4.5 15.54 | 2 C4.5-auto 14.46 | 3 C4.5 rules 14.94 | 4 Voted ID3 (0.6) 15.64 | 5 Voted ID3 (0.8) 16.47 | 6 T2 16.84 | 7 1R 19.54 | 8 NBTree 14.10 | 9 CN2 16.00 | 10 HOODG 14.82 | 11 FSS Naive Bayes 14.05 | 12 IDTM (Decision table) 14.46 | 13 Naive-Bayes 16.12 | 14 Nearest-neighbor (1) 21.42 | 15 Nearest-neighbor (3) 20.35 | 16 OC1 15.04

All of the values provided in the data set are as follow, where continuous represents a number that is not constrained:

age: continuous. workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked. fnlwgt: continuous. education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. education-num: continuous. marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse. occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried. race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black. sex: Female, Male. capital-gain: continuous. capital-loss: continuous. hours-per-week: continuous. native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

Contributing

Pull requests are welcome. If you have any specific question please feel free to send me a message. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

About

Using data provided by the census bureau database, this model can predict if an individual makes over $50k a year with about 83% accuracy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages