CardioFusion: Cardiovascular Disease Risk Assessment

CardioFusion is a professional machine learning platform that predicts cardiovascular disease risk using hybrid ensemble models. The platform combines traditional ML algorithms with advanced deep learning to achieve high accuracy while providing explainable predictions through SHAP analysis.

Live Demo: cardio-7ju34z7mh8sbn8fddyj2p8.streamlit.app

Key Features

Hybrid Ensemble Models - Multiple ML algorithms working together for superior accuracy
Interactive Web Interface - Professional Streamlit app for real-time predictions
SHAP Explainability - Visual explanations showing which factors drive predictions
Comprehensive Analytics - Model performance metrics and comparisons
Cloud Deployment - Production-ready on Streamlit Cloud with Git LFS
Clinical Focus - Designed for healthcare professionals with actionable insights

Quick Start

Option 1: Use the Live App

Visit cardio-7ju34z7mh8sbn8fddyj2p8.streamlit.app to use the deployed application immediately.

Option 2: Run Locally

# Clone the repository
git clone https://github.com/Apc0015/Cardio.git
cd Cardio

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run src/app.py

Visit http://localhost:8501 in your browser.

Project Structure

Cardio/
├── src/
│   └── app.py                   # Main Streamlit application
├── models/                       # Trained ML models (Git LFS)
│   ├── baseline_models/         # Logistic Regression, Decision Tree, Random Forest
│   ├── advanced_models/         # XGBoost, Neural Network, Hybrid Ensemble
│   └── preprocessing/           # Scalers and encoders
├── data/                         # Dataset files (Git LFS)
│   ├── raw/                     # Original cardiovascular dataset
│   └── processed/               # Cleaned and preprocessed data
├── notebooks/                    # Jupyter notebooks
│   ├── data_preprocessing.ipynb # Data cleaning and feature engineering
│   ├── baseline_models.ipynb    # Traditional ML models
│   └── advanced_models.ipynb    # Advanced models and ensembles
├── requirements.txt             # Python dependencies
└── packages.txt                 # System packages for deployment

Model Performance

Based on actual test results:

Model	Accuracy	Precision	Recall	F1-Score	ROC-AUC
XGBoost	95.34%	98.93%	91.67%	95.16%	98.61%
Hybrid Ensemble	91.45%	90.53%	92.58%	91.55%	97.66%
Logistic Regression	83.94%	84.81%	82.70%	83.74%	92.61%
Random Forest	83.40%	80.16%	88.76%	84.24%	91.75%
Neural Network	83.32%	85.11%	80.79%	82.89%	92.18%
Decision Tree	82.10%	82.91%	80.88%	81.88%	90.80%

Dataset: 567,606 samples (50/50 balanced with SMOTE) Train/Test Split: 80/20 (454,084 / 113,522 samples)

How It Works

1. Data Input

Patients or healthcare providers enter 17 health parameters including:

Demographics (age, sex)
Physical measurements (height, weight, BMI)
Lifestyle factors (exercise, smoking, alcohol, diet)
Medical history (diabetes, depression, arthritis, cancer)
General health status

2. Feature Engineering

The system automatically:

Encodes categorical variables
Engineers 27 features from 17 inputs
Scales numerical features
Validates input ranges

3. Ensemble Prediction

6 trained models generate predictions:

Logistic Regression - Baseline linear model
Decision Tree - Interpretable tree-based model
Random Forest - Ensemble of trees
XGBoost - Gradient boosting
Neural Network - Deep learning (Keras)
Hybrid Ensemble - Weighted combination

4. Risk Assessment

The system provides:

Risk Percentage (0-100%)
Risk Category (Low, Moderate, High)
Confidence Score
Individual Model Predictions

5. Explainability

SHAP analysis reveals:

Top risk-increasing factors
Top risk-decreasing factors
Feature importance scores
Actionable health recommendations

Streamlit Cloud Deployment

Current Deployment

Live URL: https://cardio-7ju34z7mh8sbn8fddyj2p8.streamlit.app
Platform: Streamlit Community Cloud
Python Version: 3.13
Auto-deploy: Enabled on push to main branch

Deploy Your Own Instance

Step 1: Fork & Clone

# Fork the repository on GitHub
# Clone your fork
git clone https://github.com/YOUR_USERNAME/Cardio.git
cd Cardio

Step 2: Deploy to Streamlit Cloud

Visit share.streamlit.io
Sign in with GitHub
Click "New app"
Configure deployment:
- Repository: YOUR_USERNAME/Cardio
- Branch: main
- Main file path: src/app.py
- Python version: 3.13 (auto-detected)
Click "Deploy!"

Step 3: Wait for Deployment

Initial deployment: ~5-8 minutes
Subsequent updates: ~3-5 minutes
Models and data download automatically via Git LFS

Step 4: Access Your App

Your app will be live at: https://your-app-name.streamlit.app

Deployment Configuration

Files Used:

requirements.txt - Python dependencies (TensorFlow, scikit-learn, SHAP, etc.)
packages.txt - System packages (libgomp1 for OpenMP support)
.streamlit/config.toml - Streamlit configuration
.gitattributes - Git LFS configuration for large files

Git LFS Files (Auto-downloaded):

All model files in models/ (~49 MB)
All data files in data/ (~257 MB)
Total: ~306 MB

Automatic Updates

Every push to main branch triggers automatic redeployment:

# Make changes locally
git add .
git commit -m "Update: description"
git push origin main

# Streamlit Cloud automatically redeploys (~3-5 minutes)

Resource Limits (Free Tier)

Memory: 1 GB (app uses ~800 MB)
CPU: 2 cores
Storage: 50 GB (app uses ~306 MB)
Concurrent users: 100+
Monthly hours: Unlimited
Apps per account: 1 public app

Development

Local Setup with Jupyter

# Install development dependencies
pip install -r requirements.txt
pip install jupyter ipykernel

# Start Jupyter Lab
jupyter lab

Available Notebooks:

data_preprocessing.ipynb - Data cleaning and feature engineering
baseline_models.ipynb - Traditional ML model training
advanced_models.ipynb - Advanced models and ensemble

Code Quality

# Format code
black src/

# Type checking
mypy src/

# Linting
flake8 src/

Documentation

README.md (this file) - Quick start and deployment guide
data/README.md - Data directory documentation
models/README.md - Models directory documentation
notebooks/ - Interactive Jupyter notebooks with detailed workflows

Privacy & Security

No Data Storage: Patient data is processed in-memory only
Session Isolation: Each user session is independent
Input Validation: All inputs validated and sanitized
HTTPS: Encrypted communication via Streamlit Cloud
No Logging: Predictions are not logged or tracked
Open Source: Code is fully transparent and auditable

Medical Disclaimer: This tool is for educational and informational purposes only. It does not provide medical advice, diagnosis, or treatment. Always consult with qualified healthcare professionals for medical decisions.

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Authors

Ayush Chhoker - Developer - Apc0015
Taylor Hunter - Developer - Taylor-Hunter
Manaswi Thudi - Developer - thudimanaswi

Acknowledgments

Dataset from Kaggle Cardiovascular Disease Dataset
Built with Streamlit, scikit-learn, XGBoost, and TensorFlow
SHAP library for model explainability
CDC for cardiovascular health indicators and research

Support

Issues: GitHub Issues
Discussions: GitHub Discussions

Roadmap

Completed

Hybrid ensemble model development
SHAP explainability integration
Professional Streamlit interface
Streamlit Cloud deployment
Git LFS for model storage
Comprehensive testing suite

Planned

Multi-language support (i18n)
Batch prediction via CSV upload
REST API endpoint
User authentication for healthcare providers
Prediction history tracking
Mobile-responsive improvements
FHIR healthcare data integration
Automated model retraining pipeline

Made for better cardiovascular health outcomes

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
data		data
models		models
notebooks		notebooks
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
packages.txt		packages.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CardioFusion: Cardiovascular Disease Risk Assessment

Key Features

Quick Start

Option 1: Use the Live App

Option 2: Run Locally

Project Structure

Model Performance

How It Works

1. Data Input

2. Feature Engineering

3. Ensemble Prediction

4. Risk Assessment

5. Explainability

Streamlit Cloud Deployment

Current Deployment

Deploy Your Own Instance

Step 1: Fork & Clone

Step 2: Deploy to Streamlit Cloud

Step 3: Wait for Deployment

Step 4: Access Your App

Deployment Configuration

Automatic Updates

Resource Limits (Free Tier)

Development

Local Setup with Jupyter

Code Quality

Documentation

Privacy & Security

Contributing

License

Authors

Acknowledgments

Support

Roadmap

Completed

Planned

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages