Diapredictor: Diabetes Risk Assessment & Insights

DiaPredictor is a comprehensive web application—originally developed as a university project—designed to help individuals assess their diabetes risk and receive personalized recommendations. It leverages data analysis, machine learning, and a conversational chatbot interface to provide actionable health insights.

Project Description

Diapredictor provides an easy-to-use interface for individuals to assess their diabetes risk. The system is built on a robust dataset that has been cleaned and enriched with synthetic data to improve model accuracy.

The project features include:

Original vs. Modified Dataset Comparison: View and analyze the differences between raw and enriched data.
Machine Learning Model Evaluation: Compare performance metrics between models trained on different datasets.
Chatbot Assistant: Receive personalized health advice and diabetes risk assessments.
Predictor System: Input your health data and receive an immediate risk analysis.

The system is implemented using:

Streamlit for an interactive frontend.
Scikit-Learn for training and evaluating models.
Rasa for an AI-powered chatbot.

Screenshots

Key Features

Diabetes Risk Prediction
- Users can input personal health metrics (e.g., age, BMI, glucose levels, smoking status).
- The model predicts the likelihood of diabetes and provides actionable health recommendations.
Comparative Model Analysis
- Evaluate different models trained on both the original and enriched datasets.
- Metrics include accuracy, precision, mean squared error (MSE), and R² score.
Chatbot Assistance
- AI-powered chatbot offers real-time insights on diabetes risk, prevention, and lifestyle changes.
Data Preprocessing & Augmentation
- Imbalanced dataset? We applied SMOTENC to generate synthetic minority class samples.
- Features like gender, smoking history, and outliers were handled for optimal model training.

Installation

Prerequisites:

Python: version 3.10
Git: For cloning the repository and managing submodules.

Installation Steps

Clone the repository:

git clone https://github.com/FaresM7/DiaPredictor.git

Create a virtual environment with Python 3.10

py -3.10 -m venv venv
source venv/bin/activate   # On Windows use `venv\Scripts\activate`

Install the required dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python start.py
```
In case you are not redirected to the Streamlit page, open your browser and navigate to:
```
http://localhost:8501
```

Repository Visualization

Data Overview

Data Overview

Original Dataset

We used the Diabetes prediction dataset from Kaggle for training the models and later enhance the dataset.

[!NOTE] This dataset is used strictly for educational and demonstration purposes.

Data Analysis Before Transformation

Initial data visualization revealed the following:

A slight drop in diabetes cases between ages 60–70, followed by a sharp increase at 80+.
Minimal differences in diabetes cases between males and females.
Nearly symmetrical distribution with minimal differences between mean and median values.
Quartiles exhibited expected variations across attributes.

Data Cleaning and Handling

Incomplete Data: Removed incomplete examples using Pandas. Labels were verified to ensure direct labeling.
Balancing: The dataset originally had a 10:1 imbalance favoring non-diabetic cases.
- Downsampling: Majority class reduced to 20% (1:2 ratio).
- Oversampling: SMOTENC (Synthetic Minority Over-sampling for Nominal and Continuous data) was applied to the minority class to balance the dataset while addressing potential overfitting concerns.

Transformations Applied to Original and Modified Datasets

Splitting: Dataset divided into Training (70%), Validation (15%), and Test (15%) sets. Initial splits showed a 1:3 imbalance for diabetic cases.
Categorical Data Encoding: Hot-end encoding converted categorical features into binary columns for model training.
Normalization: Linear scaling normalized feature values, addressing right-skewed distributions (e.g., age). Z-Score standardization was avoided due to non-normal distributions and minimal outliers.
Removal of Unnecessary Attributes: Attributes with minimal model impact (e.g., gender, certain smoking history categories) were excluded from the modified dataset.

Post-Balancing Observations

Diabetes correlation with age showed a zigzag pattern, with a sharp increase in cases at ages 75–80.
Slight gender differences in diabetes cases remained.
Blood glucose levels displayed increased variation after SMOTENC, with standard deviation rising from 40.90 to 52.55.
Quartiles maintained expected variation, with a slightly increased spread compared to the original dataset.

Chatbot Implementation with Rasa and Streamlit

Chatbot Implementation with Rasa and Streamlit

Chatbot Interface (Streamlit + Rasa)

Importing Libraries

streamlit: Builds the chatbot interface.
json: Handles structured message data.
requests: Sends requests to Rasa’s REST API.
time: Introduces wait times for unavailable servers.

Function Definitions

check_server_ready()

Checks if the Rasa server is running via a GET request to /status.
If unavailable, waits 5 seconds before retrying.

get_bot_response(user_input)

Sends user input to the Rasa bot (/webhooks/rest/webhook).
Processes responses:
- If multiple bot messages exist, extracts and displays all.
- If no response, prompts the user to rephrase.
- If an error occurs, returns a diagnostic message.

Streamlit Page Setup

Uses st.set_page_config() to define layout and title ("Chatbot Interface").

Server Readiness Check

Verifies if the Rasa server is ready before loading the chat UI.

Displaying Chat History

Retrieves conversation history from session state and displays previous messages.

Handling User Input

Uses st.chat_input() for message submission.
Saves user messages in session state for persistence.

Communicating with the Bot

Sends user input to get_bot_response() and displays the bot’s reply.

Displaying Responses

Uses st.chat_message() to display user and bot messages dynamically.

Error Handling

Detects server issues and informs users if the bot is unavailable.

Chatbot NLU (Natural Language Understanding)

NLU (Intent Recognition)

Extracts intent (user’s goal) and entities (important details).
Example:
- Intent: "report_illness" (User feels unwell).
- Entities: "symptoms": "fever, headache".

Rules (Predefined Responses)

Define automatic replies for specific user inputs.
Example:
- User: "Hello"
- Bot: "Hi there! How can I help?"

Stories (Conversation Flow)

Control multi-step interactions.
Example:
- User: "I feel unwell"
- Bot: "Can you describe your symptoms?"
- User: "I have a headache and fever."
- Bot: "I recommend rest and hydration. Would you like medical advice?"

Actions (Custom Logic)

ActionProvideTips

Purpose: Delivers personalized health advice.
How it works:
- Retrieves user conditions (e.g., smoking, hypertension).
- Provides general (e.g., exercise) and personalized (condition-specific) recommendations.

ActionPredictDiabetes

Purpose: Estimates the user’s diabetes risk.
How it works:
- Collects user data (e.g., age, BMI, glucose).
- Uses a machine learning model to predict risk (low, moderate, high).
- Returns actionable health tips.

ActionRememberName

Purpose: Enhances conversations by remembering user names.
How it works:
- Extracts the name from input and stores it in a slot.
- If missing, prompts the user to re-enter it.

Warning

Diapredictor is not a substitute for professional medical advice, diagnosis, or treatment.
The diabetes risk assessments and recommendations are for educational and research purposes only. Always seek the advice of qualified healthcare professionals for any medical concerns.
Use of this application is entirely at your own risk, and the developers assume no liability for any actions taken based on its output.

License

This project is licensed under the MIT License.
See the LICENSE file for details.

Contact

For inquiries, feedback, or suggestions:

LinkedIn: in/fares-elbermawy

We welcome contributions! Feel free to open a pull request or start a discussion in our GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
.devcontainer		.devcontainer
.github		.github
Additional_Scripts		Additional_Scripts
Datasets		Datasets
pages		pages
rasa_backend		rasa_backend
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Data_Transformation_Documentation.ipynb		Data_Transformation_Documentation.ipynb
LICENSE		LICENSE
README.md		README.md
diagram.svg		diagram.svg
requirements.txt		requirements.txt
start.py		start.py
streamlit-__Intro-2025-01-20-11-01-43.webm		streamlit-__Intro-2025-01-20-11-01-43.webm
🩺_Intro.py		🩺_Intro.py

Folders and files

Latest commit

History

Repository files navigation

Diapredictor: Diabetes Risk Assessment & Insights

Table of Contents

Project Description

Screenshots

Key Features

Installation

Prerequisites:

Installation Steps

Repository Visualization

Data Overview

Original Dataset

Data Analysis Before Transformation

Data Cleaning and Handling

Transformations Applied to Original and Modified Datasets

Post-Balancing Observations

Chatbot Implementation with Rasa and Streamlit

Chatbot Interface (Streamlit + Rasa)

Importing Libraries

Function Definitions

Streamlit Page Setup

Server Readiness Check

Displaying Chat History

Handling User Input

Communicating with the Bot

Displaying Responses

Error Handling

Chatbot NLU (Natural Language Understanding)

NLU (Intent Recognition)

Rules (Predefined Responses)

Stories (Conversation Flow)

Actions (Custom Logic)

ActionProvideTips

ActionPredictDiabetes

ActionRememberName

License

Contact

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages