-
This project explores survival patterns in the Titanic dataset, revealing key insights:
- Gender Disparity: Women had a 74.20% survival rate, compared to 18.89% for men, reflecting the "women and children first" protocol. - Age Impact: Children (under 13) survived at 57.97%, while teenagers had a 41.05% rate. - Model Performance: The ensemble model achieved a validation accuracy of 84.17% and a training accuracy of 90.35%, with a Kaggle score of 0.77990.
-
- Python: Core programming language.
- Libraries: numpy, pandas, matplotlib, seaborn, scikit-learn.
- Jupyter Notebook: Analysis in TitanicSurvivalPrediction.ipynb
-
- The stacking ensemble will use the RandomForestClassifier, GradientBoostingClassifier, and SVC, using LogisticRegression as the meta-classifier.
- Feature engineering: Added FamilySize, IsAlone, Title, and binned Age and Fare.
- Hyperparameter tuning with RandomizedSearchCV and StratifiedKFold cross-validation.
-
- To get a local copy up and running, follow these steps.
- Prerequisites
- Python 3.x
- Jupyter Notebook
- Git
- Setup
- Clone this repository:
git clone https://github.com/Marlyn-Mayienga/TitanicSurvivalPrediction.git
- Clone this repository:
- Install
- Install dependencies:
pip install numpy pandas matplotlib seaborn scikit-learn
- Install dependencies:
- Usage
- Ensure
train.csv
andtest.csv
from the Kaggle Titanic competition are in the project directory. - Open
TitanicSurvivalPrediction.ipynb
in Jupyter Notebook. - Run all cells to reproduce the analysis, train the model, and generate submission predictions.
- Ensure
-
- 👤 Marlyn Mayienga
-
GitHub: @Marlyn_Mayienga
-
Twitter: @Merl_Mayienga
-
LinkedIn: Marlyn_Mayienga
-
- Incorporate additional ensemble techniques (e.g., XGBoost).
- Explore advanced feature interactions for improved accuracy.
- Add visualizations of survival patterns by class and embarkation port.
Contributions, issues, and feature requests are welcome!
-
- Feel free to fork the repository, submit pull requests, or open issues for bugs and feature suggestions. Check the issues page for ongoing tasks
-
- Give a ⭐️ if you like this project! Your support motivates further development.
-
- Kaggle: For providing the Titanic dataset and competition platform.
- Scikit-learn Team: For robust machine learning tools.
- References:
- Zhou, Zhi-Hua. Ensemble Methods: Foundations and Algorithms.
- Elinder, M., & Erixson, O. (2012). Gender, social norms, and survival in maritime disasters. PNAS.
-
- This project is MIT licensed.
NOTE: we recommend using the MIT license - you can set it up quickly by using templates available on GitHub. You can also use any other license if you wish.