Predicting diabetes using classification techniques in machine learning.
Objective of this project is to classify the dataset for diabetes with highest accuracy possible. This is proposed to be achieved through machine learning techniques.
Dataset is obtained from Kaggle, originally from the National Institute of Diabetes and Digestive and Kidney Diseases.
In particular, all patients here are females at least 21 years old of Pima Indian heritage.
Pregnancies: Number of times pregnant
Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
BloodPressure: Diastolic blood pressure (mm Hg)
SkinThickness: Triceps skin fold thickness (mm)
Insulin: 2-Hour serum insulin (mu U/ml)
BMI: Body mass index (weight in kg/(height in m)^2)
DiabetesPedigreeFunction: Diabetes pedigree function Age: Age (years)
Outcome: Class variable (0 or 1)
Pandas, numpy, matplotlib, seaborn and sklearn
Logistic Regression, Random Forest, KNN ( k-nearest neighbors), Support Vector Machine (SVC)