Skip to content

CreepyLewis/Data-Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 

Repository files navigation

🧠 Data Science Portfolio — Lewis

Turning raw data into insight, one notebook at a time.


👋 About This Repository

Welcome to my personal Data Science learning and project hub. This repository documents my journey through data analysis, machine learning, statistical modeling, and real-world problem-solving with data.

Whether you're here to learn, collaborate, or explore — make yourself at home.


📂 Repository Structure

Data-Science/
│
├── 📊 EDA/                    # Exploratory Data Analysis notebooks
│   ├── titanic_eda.ipynb
│   ├── world_happiness_eda.ipynb
│   └── retail_sales_eda.ipynb
│
├── 🤖 Machine-Learning/       # Supervised & unsupervised ML projects
│   ├── house_price_prediction/
│   ├── customer_churn/
│   └── spam_classifier/
│
├── 📈 Visualization/          # Charts, dashboards, and storytelling
│   ├── matplotlib_showcase.ipynb
│   └── plotly_interactive.ipynb
│
├── 🧹 Data-Cleaning/          # Messy data → clean data pipelines
│   └── cleaning_pipeline.ipynb
│
├── 📝 Notes/                  # Study notes & reference sheets
│   ├── statistics_101.md
│   ├── pandas_cheatsheet.md
│   └── sklearn_reference.md
│
└── README.md

⚠️ This structure is a roadmap — projects are added progressively.


🔥 Featured Projects

1. 🏠 House Price Prediction

Goal: Predict housing prices using regression models.
Tools: pandas, scikit-learn, matplotlib, seaborn
Highlights:

  • Feature engineering on 80+ columns
  • Compared Linear Regression, Ridge, Lasso, and XGBoost
  • Final RMSE: ~18,000 (top 15% Kaggle score)

2. 📉 Customer Churn Analysis

Goal: Identify customers likely to cancel their subscription.
Tools: pandas, sklearn, imbalanced-learn, SHAP
Highlights:

  • Handled severe class imbalance with SMOTE
  • Random Forest + SHAP for explainability
  • Precision: 87% | Recall: 82%

3. 🌍 World Happiness EDA

Goal: Deep dive into factors driving happiness across nations.
Tools: pandas, plotly, seaborn, statsmodels
Highlights:

  • Correlation heatmaps and regression analysis
  • Interactive choropleth world map
  • Insight: GDP per capita explains ~63% of happiness variance

4. 📧 Spam Classifier

Goal: Binary classification of emails as spam or not spam.
Tools: sklearn, NLTK, TF-IDF, Naive Bayes
Highlights:

  • Full NLP pipeline: tokenization → vectorization → classification
  • Accuracy: 98.4% on test set
  • False positive rate kept below 1%

🛠️ Tech Stack

Category Tools
Languages Python 3.x
Data Manipulation pandas, NumPy
Visualization matplotlib, seaborn, Plotly
Machine Learning scikit-learn, XGBoost, LightGBM
NLP NLTK, spaCy, TF-IDF
Statistics statsmodels, SciPy
Notebooks Jupyter, Google Colab
Version Control Git, GitHub

📚 Learning Path

Here's the roadmap I'm following to level up:

  • Python fundamentals & OOP
  • NumPy & pandas for data manipulation
  • Data visualization with matplotlib & seaborn
  • Exploratory Data Analysis (EDA)
  • Statistics: distributions, hypothesis testing, regression
  • Supervised learning (regression + classification)
  • Unsupervised learning (clustering, dimensionality reduction)
  • Model evaluation, tuning, and deployment
  • Deep Learning with TensorFlow / PyTorch
  • MLOps & model monitoring

🚀 Getting Started

Clone the repo and install dependencies:

git clone https://github.com/CreepyLewis/Data-Science.git
cd Data-Science
pip install -r requirements.txt

Open any notebook with Jupyter:

jupyter notebook

Or open directly in Google Colab by clicking the badge at the top of each notebook.


📦 requirements.txt

numpy
pandas
matplotlib
seaborn
plotly
scikit-learn
xgboost
lightgbm
nltk
spacy
statsmodels
scipy
imbalanced-learn
shap
jupyter

🤝 Contributing

This is a personal portfolio repo, but PRs and issues are very welcome!

  • 🐛 Found a bug in a notebook? Open an issue.
  • 💡 Have a dataset or project idea? Drop it in Discussions.
  • 🌟 Liked the work? Give it a star — it helps a lot!

📬 Contact

Platform Link
GitHub @CreepyLewis
Email coming soon
LinkedIn coming soon

📄 License

This project is licensed under the MIT License — feel free to use, remix, and build on it.


Made with 💻, ☕, and a lot of .head() calls.

Star this repo if you find it useful!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages