Skip to content

Explore my diverse collection of projects showcasing machine learning, data analysis, and more. Organized by project, each directory contains code, datasets, documentation, and resources.

License

Notifications You must be signed in to change notification settings

devika-be/Data-Science-and-Machine-Learning-Projects

Repository files navigation

Data-Science-Projects

Explore my diverse collection of projects showcasing machine learning, data analysis, and more. Organized by project, each directory contains code, datasets, documentation, and resources.

Welcome to my Data Science and Machine Learning Projects Repository! This repository contains a collection of my data science projects, showcasing my skills and expertise in the field. Each project demonstrates different aspects of data analysis, machine learning, and visualization.

Project Details:

  1. Breast Cancer Prediction

    • Description: The project predicts the diagnosis (M = malignant, B = benign) of the Breast Cancer
    • Technologies Used: The notebooks use Decision Tree Classification and Logistic Regression
    • Results: The logistic regression gave 97% accuracy, and the decision tree gave 93.5% accuracy
  2. E-Commerce Product Delivery Prediction

    • Description: The aim of this project is to predict whether products from an international e-commerce company will reach customers on time or not. Additionally, the project analyzes various factors influencing product delivery and studies customer behavior. The company primarily sells electronic products.
    • Technologies Used: The notebooks use Exploratory Data Analysis, Decision Tree Classifier, K Nearest Neighbors, and Logistic Regression.
    • Results: The decision tree classifier as the highest accuracy among the other models, with an accuracy of 69%. The random forest classifier and logistic regression had accuracy of 68% and 67% respectively. The K Nearest Neighbors had the lowest accuracy of 65%.
  3. Diamond Price Prediction

    • Description: The aim of this analysis is to predict the price of diamonds based on their characteristics. The dataset used for this analysis is the Diamonds dataset from Kaggle. The dataset contains 53940 observations and 10 variables.
    • Technologies Used: The notebooks use Exploratory Data Analysis, Decision Tree Regressor, and Random Forest Regressor.
    • Results: Both models have almost the same accuracy. However, the Random Forest Regressor model is slightly better than the Decision Tree Regressor model. There is something interesting about the data. The price of the diamonds with J color and I1 clarity is higher than the price of the diamonds with D color and IF clarity, which couldn't be explained by the models. This could be because of the other factors that affect the price of the diamond.
  4. Heart Stroke Prediction

    • Description: The aim is to predict the likelihood of a patient experiencing a stroke based on various input parameters such as gender, age, presence of diseases, and smoking status. The dataset provides relevant information about each patient, enabling the development of a predictive model.
    • Technologies Used: The notebooks uses Exploratory Data Analysis, Logistic Regression, Logistic Regression, Support Vector Machine (SVM), Decision Tree Classifier, K-Nearest Neighbors (KNN).
    • Results: The model accuracies of Logistic Regression, SVM, and KNN are quite similar, i.e., 93.8 %. The accuracy of the Decision Tree Classifier is 91.8 %. So, we can use any of these models to predict the heart stroke.
  5. Breast Cancer Prediction

    • Description: This project aims to predict whether a breast mass is malignant or benign using classification algorithms. The dataset consists of features derived from digitized images of fine needle aspirates (FNA) of breast masses. These features capture various characteristics of the cell nuclei in the images, such as radius, texture, smoothness, and symmetry, which are critical indicators in breast cancer diagnosis.
    • Technologies Used: The notebook utilizes Exploratory Data Analysis (EDA), Logistic Regression, Support Vector Machine (SVM), Decision Tree Classifier, and K-Nearest Neighbors (KNN) for model training and evaluation.
    • Results: The machine learning models showed promising results. SVM and Logistic Regression achieved the highest accuracy of 96.5%, followed closely by KNN at 95.6% and Decision Tree at 93.4%. Based on the performance, SVM or Logistic Regression can be preferred for reliable breast cancer prediction.
  6. Hospital Cost Prediction

    • Description: This project aims to predict the total hospital costs for a patient based on various factors such as type of disease, hospital type, and patient demographics. It addresses real-world healthcare challenges like identifying overcharging hospitals, estimating patient expenses, and aiding in hospital selection.

    • Technologies Used: The project uses Exploratory Data Analysis, Linear Regression, Ridge Regression, and Lasso Regression. Linear and Ridge Regression are implemented from scratch using the Moore-Penrose pseudoinverse method. The workflow also includes feature engineering, random search, and k-fold cross-validation for optimal performance tuning.

    • Results: The initial model using Ridge Regression without feature engineering achieved a 10-fold R² score of 0.56. After applying feature engineering techniques, the R² score improved significantly to 0.778, indicating a 38.75% improvement in model performance. Lasso Regression was used to analyze the importance of features post-enhancement.

About

Explore my diverse collection of projects showcasing machine learning, data analysis, and more. Organized by project, each directory contains code, datasets, documentation, and resources.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published