An AI-powered web application for lung cancer risk assessment using machine learning and deep learning techniques. This system combines clinical data analysis, CT scan image processing, and bulk dataset analysis to provide comprehensive cancer risk predictions.
- π©Ί Clinical Risk Assessment: Comprehensive form-based evaluation using patient symptoms, medical history, and lifestyle factors
- πΌοΈ CT Scan Analysis: AI-powered chest CT image interpretation with nodule detection
- π Dataset Processing: Bulk analysis of patient datasets with statistical insights and batch predictions
- π― Real-time Predictions: Instant results with confidence scores and risk levels
- π± Responsive Design: Modern, mobile-friendly web interface
- π Automated Setup: One-command installation and deployment
Watch the demo video
Lung-Cancer-Prediction.mp4
# Clone the repository
git clone https://github.com/your-username/lung-cancer-prediction.git
cd lung-cancer-prediction
# Run automated setup and launch
python setup_and_run.py# Clone and navigate
git clone https://github.com/your-username/lung-cancer-prediction.git
cd lung-cancer-prediction
# Install dependencies
pip install -r requirements.txt
# Run the application
python app.pyπ Access the application at: http://localhost:5000
- Python: 3.8 or higher
- Operating System: Windows, macOS, or Linux
- Memory: At least 4GB RAM
- Storage: 2GB free space
- Internet: Required for dataset downloads
lung-cancer-prediction/
βββ π± app.py # Main Flask application
βββ π€ data_preparation_and_training.py # ML model training pipeline
βββ βοΈ setup_and_run.py # Automated setup script
βββ π requirements.txt # Python dependencies
βββ π README.md # Project documentation
βββ π INSTALLATION_GUIDE.md # Detailed setup guide
βββ π sample_patient_data.csv # Sample dataset for testing
βββ π« .gitignore # Git ignore rules
β
βββ π¨ templates/ # HTML templates
β βββ π index.html # Homepage
β βββ π©Ί clinical.html # Clinical prediction
β βββ πΌοΈ ct_scan.html # CT scan analysis
β βββ π dataset.html # Dataset analysis
β βββ π layout.html # Base template
β
βββ π data/ # Downloaded datasets (auto-created)
βββ π§ models/ # Trained ML models (auto-created)
βββ π€ uploads/ # User uploads (auto-created)
βββ π¨ static/ # Static assets (auto-created)
- Algorithm: Random Forest Classifier
- Features: 15 clinical parameters
- Accuracy: ~85-95%
- Output: Risk level (High/Low) with confidence score
- Algorithm: Convolutional Neural Network (CNN)
- Input: 224x224 grayscale CT images
- Architecture: Deep CNN with batch normalization
- Output: Cancer probability with nodule detection
The system integrates with three Kaggle datasets:
-
- Clinical data and air pollution correlation
- Patient demographics and symptoms
-
- Medical imaging dataset
- Normal vs. abnormal classifications
-
- Additional medical information
- Text-based medical records
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Homepage |
/clinical |
GET | Clinical prediction form |
/ct-scan |
GET | CT scan upload interface |
/dataset |
GET | Dataset analysis interface |
/predict_clinical |
POST | Clinical risk prediction |
/predict_ct_scan |
POST | CT scan image analysis |
/analyze_dataset |
POST | Bulk dataset processing |
# Example patient data
patient_data = {
"age": 65,
"gender": "M",
"smoking": 2, # Current smoker
"chest_pain": 1,
"coughing": 1
# ... other parameters
}Age,Gender,Smoking,Yellow_Fingers,Anxiety,Chest_Pain,Lung_Cancer
65,M,2,1,0,1,1
45,F,1,0,1,0,0
70,M,2,1,1,1,1# Clone repository
git clone https://github.com/your-username/lung-cancer-prediction.git
cd lung-cancer-prediction
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run in development mode
export FLASK_ENV=development # On Windows: set FLASK_ENV=development
python app.py- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Clinical Model Accuracy: 85-95%
- Response Time: <2 seconds
- Supported File Formats: CSV, JPG, PNG, DICOM
- Maximum File Size: 16MB
- Concurrent Users: 100+
- No patient data is stored permanently
- All uploads are processed locally
- HIPAA-compliant design principles
- Secure file handling and validation
- Educational Purpose Only: This system is designed for educational and research purposes
- Not for Medical Diagnosis: Always consult healthcare professionals for medical advice
- Research Tool: Predictions are based on limited training data
- No Medical Liability: This tool should not replace professional medical consultation
Contributions are welcome! Please read our Contributing Guidelines for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
- Kaggle community for providing datasets
- TensorFlow and scikit-learn teams
- Flask development team
- Bootstrap for UI components
- Medical professionals who provided guidance