Skip to content

Predicting passenger survival on the Titanic using an ensemble machine learning approach, achieving a Kaggle score of 0.77990. This project leverages stacking with Random Forest, Gradient Boosting, and SVM, enhanced by feature engineering and hyperparameter tuning, to model survival patterns effectively.

Notifications You must be signed in to change notification settings

Marlyn-Mayienga/Titanic-Survival-Prediction

Repository files navigation

📗 Table of Contents

  • 📖 Machine Learning Insights into Titanic Survival

  • This project explores survival patterns in the Titanic dataset, revealing key insights:

        - Gender Disparity: Women had a 74.20% survival rate, compared to 18.89% for men, reflecting the "women and children first" protocol.
        - Age Impact: Children (under 13) survived at 57.97%, while teenagers had a 41.05% rate.
        - Model Performance: The ensemble model achieved a validation accuracy of 84.17% and a training accuracy of 90.35%, with a Kaggle score of 0.77990.
    
  • 🛠 PYTHON

    • Python: Core programming language.
    • Libraries: numpy, pandas, matplotlib, seaborn, scikit-learn.
    • Jupyter Notebook: Analysis in TitanicSurvivalPrediction.ipynb
  • Key Features

    • The stacking ensemble will use the RandomForestClassifier, GradientBoostingClassifier, and SVC, using LogisticRegression as the meta-classifier.
    • Feature engineering: Added FamilySize, IsAlone, Title, and binned Age and Fare.
    • Hyperparameter tuning with RandomizedSearchCV and StratifiedKFold cross-validation.
  • 💻 Getting Started

    • To get a local copy up and running, follow these steps.
    • Prerequisites
      • Python 3.x
      • Jupyter Notebook
      • Git
    • Setup
      • Clone this repository: git clone https://github.com/Marlyn-Mayienga/TitanicSurvivalPrediction.git
    • Install
      • Install dependencies: pip install numpy pandas matplotlib seaborn scikit-learn
    • Usage
      • Ensure train.csv and test.csv from the Kaggle Titanic competition are in the project directory.
      • Open TitanicSurvivalPrediction.ipynb in Jupyter Notebook.
      • Run all cells to reproduce the analysis, train the model, and generate submission predictions.
  • 👥 Authors

    • 👤 Marlyn Mayienga
  • GitHub: @Marlyn_Mayienga

  • Twitter: @Merl_Mayienga

  • LinkedIn: Marlyn_Mayienga

  • 🔭 Future Features

    • Incorporate additional ensemble techniques (e.g., XGBoost).
    • Explore advanced feature interactions for improved accuracy.
    • Add visualizations of survival patterns by class and embarkation port.

    Contributions, issues, and feature requests are welcome!

  • 🤝 Contributing

    • Feel free to fork the repository, submit pull requests, or open issues for bugs and feature suggestions. Check the issues page for ongoing tasks
  • ⭐️ Show your support

    • Give a ⭐️ if you like this project! Your support motivates further development.
  • 🙏 Acknowledgements

    • Kaggle: For providing the Titanic dataset and competition platform.
    • Scikit-learn Team: For robust machine learning tools.
    • References:
      • Zhou, Zhi-Hua. Ensemble Methods: Foundations and Algorithms.
      • Elinder, M., & Erixson, O. (2012). Gender, social norms, and survival in maritime disasters. PNAS.
  • ❓ FAQ (OPTIONAL)

  • 📝 License

    • This project is MIT licensed.

NOTE: we recommend using the MIT license - you can set it up quickly by using templates available on GitHub. You can also use any other license if you wish.

(back to top)

About

Predicting passenger survival on the Titanic using an ensemble machine learning approach, achieving a Kaggle score of 0.77990. This project leverages stacking with Random Forest, Gradient Boosting, and SVM, enhanced by feature engineering and hyperparameter tuning, to model survival patterns effectively.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published