Skip to content

PGithiri/Diabetes-prediction

Repository files navigation

Diabetes Prediction Web Application

A full-stack web application for diabetes prediction using a machine learning ensemble model. The system combines seven different algorithms in a soft voting ensemble to provide accurate predictions with confidence scores.

Overview

This application uses state-of-the-art machine learning techniques to predict diabetes based on diagnostic measurements. The model achieves 83% accuracy on test data with a soft voting ensemble approach.

Features

  • Machine Learning Model: Soft voting ensemble combining 7 algorithms (KNN, SVM, Decision Tree, Random Forest, Logistic Regression, XGBoost, LightGBM)
  • REST API: Flask-based backend with JSON endpoints
  • Interactive Web Interface: Responsive design for easy data entry and visualization
  • Real-time Predictions: Instant results with probability distributions
  • Confidence Scoring: Detailed breakdown of prediction confidence

Model Performance

  • Test Accuracy: 83.00%
  • Sensitivity (Recall): 92.00%
  • Specificity: 74.00%
  • ROC-AUC Score: 0.8812
  • PR-AUC Score: 0.8772

Project Structure

diabetes_web_app/
├── app.py                      # Flask application server
├── train_model.py              # Model training script
├── requirements.txt            # Python dependencies
├── README.md                   # This file
├── models/                     # Trained model files (generated)
│   ├── diabetes_model.pkl
│   ├── scaler.pkl
│   └── feature_names.pkl
├── templates/
│   └── index.html              # Web interface
└── static/
    ├── css/
    │   └── style.css           # Stylesheet
    └── js/
        └── app.js              # Frontend JavaScript

Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Setup Steps

  1. Navigate to the project directory

    cd diabetes_web_app
  2. Install dependencies

    pip install -r requirements.txt

    Or install packages individually:

    pip install flask numpy pandas scikit-learn xgboost lightgbm imbalanced-learn
  3. Train the model

    Ensure diabetes.csv is in the parent directory, then run:

    python train_model.py

    This will create the model files in the models/ directory.

  4. Start the application

    python app.py
  5. Access the application

    Open your browser and navigate to:

    http://localhost:5000
    

Quick Start

Windows

Double-click run.bat to automatically train the model (if needed) and start the server.

Mac/Linux

chmod +x run.sh
./run.sh

Usage

Web Interface

  1. Open http://localhost:5000 in your browser
  2. Enter patient information in the form (8 required fields)
  3. Click "Predict" to get the prediction
  4. View results including:
    • Prediction label (Diabetic/Non-Diabetic)
    • Confidence percentage
    • Probability distribution

API Endpoints

Health Check

GET /health

Response:

{
  "status": "healthy",
  "model_loaded": true,
  "scaler_loaded": true
}

Get Features

GET /features

Response:

{
  "features": ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age"],
  "count": 8
}

Make Prediction

POST /predict
Content-Type: application/json

Request body:

{
  "pregnancies": 6,
  "glucose": 148,
  "bloodPressure": 72,
  "skinThickness": 35,
  "insulin": 0,
  "bmi": 33.6,
  "diabetesPedigreeFunction": 0.627,
  "age": 50
}

Response:

{
  "prediction": 1,
  "prediction_label": "Diabetic",
  "probability": {
    "non_diabetic": 0.23,
    "diabetic": 0.77
  },
  "confidence": 77.0,
  "input_features": {
    "Pregnancies": 6,
    "Glucose": 148,
    "BloodPressure": 72,
    "SkinThickness": 35,
    "Insulin": 0,
    "BMI": 33.6,
    "DiabetesPedigreeFunction": 0.627,
    "Age": 50
  }
}

Input Features

Feature Description Range Unit
Pregnancies Number of times pregnant 0-17 count
Glucose Plasma glucose concentration 0-199 mg/dL
BloodPressure Diastolic blood pressure 0-122 mm Hg
SkinThickness Triceps skin fold thickness 0-99 mm
Insulin 2-Hour serum insulin 0-846 μU/mL
BMI Body Mass Index 0-67.1 kg/m²
DiabetesPedigreeFunction Diabetes heredity function 0.078-2.42 score
Age Age in years 21-81 years

Model Architecture

Ensemble Approach

The model uses a soft voting ensemble that combines predictions from seven different algorithms:

  1. K-Nearest Neighbors (KNN) - k=3
  2. Support Vector Machine (SVM) - RBF kernel, gamma=2, C=1
  3. Decision Tree - max_depth=5
  4. Random Forest - 10 estimators
  5. Logistic Regression - max_iter=1000
  6. XGBoost - 100 estimators, learning_rate=0.1
  7. LightGBM - 100 estimators, learning_rate=0.1

Data Preprocessing

  1. Class Balancing: RandomOverSampler to handle imbalanced dataset (500:500 ratio)
  2. Feature Scaling: MinMaxScaler to normalize all features to [0, 1] range
  3. Train-Test Split: 80% training, 20% testing with stratification

API Usage Examples

Python

import requests

url = "http://localhost:5000/predict"
data = {
    "pregnancies": 6,
    "glucose": 148,
    "bloodPressure": 72,
    "skinThickness": 35,
    "insulin": 0,
    "bmi": 33.6,
    "diabetesPedigreeFunction": 0.627,
    "age": 50
}

response = requests.post(url, json=data)
result = response.json()

print(f"Prediction: {result['prediction_label']}")
print(f"Confidence: {result['confidence']:.2f}%")

cURL

curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "pregnancies": 6,
    "glucose": 148,
    "bloodPressure": 72,
    "skinThickness": 35,
    "insulin": 0,
    "bmi": 33.6,
    "diabetesPedigreeFunction": 0.627,
    "age": 50
  }'

JavaScript

const data = {
  pregnancies: 6,
  glucose: 148,
  bloodPressure: 72,
  skinThickness: 35,
  insulin: 0,
  bmi: 33.6,
  diabetesPedigreeFunction: 0.627,
  age: 50
};

fetch('http://localhost:5000/predict', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify(data)
})
.then(response => response.json())
.then(result => console.log(result));

Test Cases

Non-Diabetic Patient

Pregnancies: 1
Glucose: 85
Blood Pressure: 66
Skin Thickness: 29
Insulin: 0
BMI: 26.6
Diabetes Pedigree Function: 0.351
Age: 31

Expected Result: Non-Diabetic

Diabetic Patient

Pregnancies: 6
Glucose: 148
Blood Pressure: 72
Skin Thickness: 35
Insulin: 0
BMI: 33.6
Diabetes Pedigree Function: 0.627
Age: 50

Expected Result: Diabetic

Troubleshooting

Module Not Found Error

Ensure all dependencies are installed:

pip install -r requirements.txt

Dataset Not Found Error

Verify that diabetes.csv is in the parent directory:

Project/
├── diabetes.csv          (must be here)
└── diabetes_web_app/
    └── train_model.py

Port Already in Use

Change the port in app.py:

app.run(debug=True, host='0.0.0.0', port=5001)

Model Files Not Found

Run the training script first:

python train_model.py

Deployment

Local Development

python app.py

Runs on http://localhost:5000 with debug mode enabled.

Production with Gunicorn

pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:app

Docker

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 5000
CMD ["python", "app.py"]

Build and run:

docker build -t diabetes-app .
docker run -p 5000:5000 diabetes-app

Technical Stack

Backend

  • Flask 3.0+
  • Python 3.8+
  • scikit-learn
  • XGBoost
  • LightGBM
  • NumPy
  • Pandas
  • imbalanced-learn
  • joblib

Frontend

  • HTML5
  • CSS3
  • JavaScript (ES6+)

Important Disclaimer

This application is intended for educational and research purposes only. It should not be used as a substitute for professional medical advice, diagnosis, or treatment. Always consult with qualified healthcare professionals for medical decisions.

License

This project is available for educational purposes.

Acknowledgments

  • Dataset: Pima Indians Diabetes Database
  • Libraries: scikit-learn, XGBoost, LightGBM, Flask

About

AI for Engineers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors