A full-stack web application for diabetes prediction using a machine learning ensemble model. The system combines seven different algorithms in a soft voting ensemble to provide accurate predictions with confidence scores.
This application uses state-of-the-art machine learning techniques to predict diabetes based on diagnostic measurements. The model achieves 83% accuracy on test data with a soft voting ensemble approach.
- Machine Learning Model: Soft voting ensemble combining 7 algorithms (KNN, SVM, Decision Tree, Random Forest, Logistic Regression, XGBoost, LightGBM)
- REST API: Flask-based backend with JSON endpoints
- Interactive Web Interface: Responsive design for easy data entry and visualization
- Real-time Predictions: Instant results with probability distributions
- Confidence Scoring: Detailed breakdown of prediction confidence
- Test Accuracy: 83.00%
- Sensitivity (Recall): 92.00%
- Specificity: 74.00%
- ROC-AUC Score: 0.8812
- PR-AUC Score: 0.8772
diabetes_web_app/
├── app.py # Flask application server
├── train_model.py # Model training script
├── requirements.txt # Python dependencies
├── README.md # This file
├── models/ # Trained model files (generated)
│ ├── diabetes_model.pkl
│ ├── scaler.pkl
│ └── feature_names.pkl
├── templates/
│ └── index.html # Web interface
└── static/
├── css/
│ └── style.css # Stylesheet
└── js/
└── app.js # Frontend JavaScript
- Python 3.8 or higher
- pip package manager
-
Navigate to the project directory
cd diabetes_web_app -
Install dependencies
pip install -r requirements.txt
Or install packages individually:
pip install flask numpy pandas scikit-learn xgboost lightgbm imbalanced-learn
-
Train the model
Ensure
diabetes.csvis in the parent directory, then run:python train_model.py
This will create the model files in the
models/directory. -
Start the application
python app.py
-
Access the application
Open your browser and navigate to:
http://localhost:5000
Double-click run.bat to automatically train the model (if needed) and start the server.
chmod +x run.sh
./run.sh- Open
http://localhost:5000in your browser - Enter patient information in the form (8 required fields)
- Click "Predict" to get the prediction
- View results including:
- Prediction label (Diabetic/Non-Diabetic)
- Confidence percentage
- Probability distribution
GET /health
Response:
{
"status": "healthy",
"model_loaded": true,
"scaler_loaded": true
}GET /features
Response:
{
"features": ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age"],
"count": 8
}POST /predict
Content-Type: application/json
Request body:
{
"pregnancies": 6,
"glucose": 148,
"bloodPressure": 72,
"skinThickness": 35,
"insulin": 0,
"bmi": 33.6,
"diabetesPedigreeFunction": 0.627,
"age": 50
}Response:
{
"prediction": 1,
"prediction_label": "Diabetic",
"probability": {
"non_diabetic": 0.23,
"diabetic": 0.77
},
"confidence": 77.0,
"input_features": {
"Pregnancies": 6,
"Glucose": 148,
"BloodPressure": 72,
"SkinThickness": 35,
"Insulin": 0,
"BMI": 33.6,
"DiabetesPedigreeFunction": 0.627,
"Age": 50
}
}| Feature | Description | Range | Unit |
|---|---|---|---|
| Pregnancies | Number of times pregnant | 0-17 | count |
| Glucose | Plasma glucose concentration | 0-199 | mg/dL |
| BloodPressure | Diastolic blood pressure | 0-122 | mm Hg |
| SkinThickness | Triceps skin fold thickness | 0-99 | mm |
| Insulin | 2-Hour serum insulin | 0-846 | μU/mL |
| BMI | Body Mass Index | 0-67.1 | kg/m² |
| DiabetesPedigreeFunction | Diabetes heredity function | 0.078-2.42 | score |
| Age | Age in years | 21-81 | years |
The model uses a soft voting ensemble that combines predictions from seven different algorithms:
- K-Nearest Neighbors (KNN) - k=3
- Support Vector Machine (SVM) - RBF kernel, gamma=2, C=1
- Decision Tree - max_depth=5
- Random Forest - 10 estimators
- Logistic Regression - max_iter=1000
- XGBoost - 100 estimators, learning_rate=0.1
- LightGBM - 100 estimators, learning_rate=0.1
- Class Balancing: RandomOverSampler to handle imbalanced dataset (500:500 ratio)
- Feature Scaling: MinMaxScaler to normalize all features to [0, 1] range
- Train-Test Split: 80% training, 20% testing with stratification
import requests
url = "http://localhost:5000/predict"
data = {
"pregnancies": 6,
"glucose": 148,
"bloodPressure": 72,
"skinThickness": 35,
"insulin": 0,
"bmi": 33.6,
"diabetesPedigreeFunction": 0.627,
"age": 50
}
response = requests.post(url, json=data)
result = response.json()
print(f"Prediction: {result['prediction_label']}")
print(f"Confidence: {result['confidence']:.2f}%")curl -X POST http://localhost:5000/predict \
-H "Content-Type: application/json" \
-d '{
"pregnancies": 6,
"glucose": 148,
"bloodPressure": 72,
"skinThickness": 35,
"insulin": 0,
"bmi": 33.6,
"diabetesPedigreeFunction": 0.627,
"age": 50
}'const data = {
pregnancies: 6,
glucose: 148,
bloodPressure: 72,
skinThickness: 35,
insulin: 0,
bmi: 33.6,
diabetesPedigreeFunction: 0.627,
age: 50
};
fetch('http://localhost:5000/predict', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(data)
})
.then(response => response.json())
.then(result => console.log(result));Pregnancies: 1
Glucose: 85
Blood Pressure: 66
Skin Thickness: 29
Insulin: 0
BMI: 26.6
Diabetes Pedigree Function: 0.351
Age: 31
Expected Result: Non-Diabetic
Pregnancies: 6
Glucose: 148
Blood Pressure: 72
Skin Thickness: 35
Insulin: 0
BMI: 33.6
Diabetes Pedigree Function: 0.627
Age: 50
Expected Result: Diabetic
Ensure all dependencies are installed:
pip install -r requirements.txtVerify that diabetes.csv is in the parent directory:
Project/
├── diabetes.csv (must be here)
└── diabetes_web_app/
└── train_model.py
Change the port in app.py:
app.run(debug=True, host='0.0.0.0', port=5001)Run the training script first:
python train_model.pypython app.pyRuns on http://localhost:5000 with debug mode enabled.
pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:appFROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 5000
CMD ["python", "app.py"]Build and run:
docker build -t diabetes-app .
docker run -p 5000:5000 diabetes-app- Flask 3.0+
- Python 3.8+
- scikit-learn
- XGBoost
- LightGBM
- NumPy
- Pandas
- imbalanced-learn
- joblib
- HTML5
- CSS3
- JavaScript (ES6+)
This application is intended for educational and research purposes only. It should not be used as a substitute for professional medical advice, diagnosis, or treatment. Always consult with qualified healthcare professionals for medical decisions.
This project is available for educational purposes.
- Dataset: Pima Indians Diabetes Database
- Libraries: scikit-learn, XGBoost, LightGBM, Flask