SWASTHYA (meaning health in Hindi) is an open-source Machine Learning & Deep Learning project for early detection and prediction of six common cancer types:
✔ Blood Cancer
✔ Breast Cancer
✔ Cervical Cancer
✔ Colorectal Cancer
✔ Gastric Cancer
✔ Lung Cancer
This project bundles ML/DL models, notebooks, and Streamlit apps to enable experimentation, training, and prediction tasks. Models are trained using tabular and/or image datasets achieving high performance (typically ~80–99% accuracy depending on the module).
⚠️ Note: This project is for research and educational purposes only and is not intended for clinical diagnosis.
- Modular design with dedicated modules for each cancer type
- Support for both tabular & image-based models
- Pre-trained models and example notebooks
- User-friendly Streamlit apps for quick predictions
- GPU/TPU compatible workflows
| Cancer Type | Data Modality | Model(s) Used |
|---|---|---|
| Blood Cancer | Image | CNN (EfficientNetB0) |
| Breast Cancer | Tabular | XGBoost, RandomForest |
| Image | CNN (TPU optimized) | |
| Cervical Cancer | Tabular | Random Forest |
| Colorectal Cancer | Image | CNN (ResNet50) |
| Tabular | XGBoost + SMOTE | |
| Gastric Cancer | Image | CNN (EfficientNetB0) |
| Lung Cancer | Tabular | XGBoost + RFE |
| Image | CNN (VGG16) |
Models achieve varied accuracy depending on data used and training configurations.
✔ Python
✔ TensorFlow / Keras
✔ Scikit-learn
✔ XGBoost
✔ Streamlit
✔ Pandas, NumPy, OpenCV
✔ Matplotlib, Seaborn
git clone https://github.com/ANUBprad/Project-SWASTHYA.git
cd Project-SWASTHYA- Install Dependencies
pip install -r requirements.txt
Make sure you have Python 3.7 or newer installed.
- Organize Your Data
Download the required datasets (e.g., from Kaggle) and place them in a structured folder like:
/data
/blood_cancer
/breast_cancer
...
- Train or Evaluate Models
Each module has a Jupyter notebook that demonstrates training or evaluation:
jupyter notebook blood_cancer_detection.ipynb
Follow similar steps for other modules.
- Run Streamlit App
Inside a module directory:
streamlit run app.py
📊 Performance Summary
Model performance varies across modules, typically achieving:
✔ ~80–99% accuracy ✔ ROC-AUC improvements with proper balancing & preprocessing
Metrics include:
Accuracy
ROC-AUC
Precision, Recall, F1-Score
Results depend on dataset, augmentation strategies, and training environment.
📦 Pretrained Models
Each module includes trained models (when available):
/models
├─ blood_cancer_detection_model.h5
├─ breast_tabular_model.joblib
├─ cervical_cancer_model.pkl
...
🤝 Contribution
Contributions are welcome! You can help by:
✔ Adding new cancer modules ✔ Improving model performance ✔ Enhancing documentation & examples ✔ Fixing bugs or improving UI
How to contribute:
Fork the repository
Create a feature branch
Commit your changes
Open a Pull Request
📄 License
This project is licensed under the MIT License — see the LICENSE file for details.
📬 Contact
Maintained by ANUBprad Have questions? Open an issue or reach out on GitHub.