Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 86 additions & 1 deletion Project_Outline.ipynb
Original file line number Diff line number Diff line change
@@ -1 +1,86 @@
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Project Outline.ipynb","provenance":[],"authorship_tag":"ABX9TyPZl4d0nA5Qmq8X1mDqSb1O"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# **Title of Project**"],"metadata":{"id":"dqZ-nhxiganh"}},{"cell_type":"markdown","source":["-------------"],"metadata":{"id":"gScHkw6jjrLo"}},{"cell_type":"markdown","source":["## **Objective**"],"metadata":{"id":"Xns_rCdhh-vZ"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"9sPvnFM1iI9l"}},{"cell_type":"markdown","source":["## **Data Source**"],"metadata":{"id":"-Vbnt9CciKJP"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"sGcv5WqQiNyl"}},{"cell_type":"markdown","source":["## **Import Library**"],"metadata":{"id":"r7GrZzX0iTlV"}},{"cell_type":"code","source":[""],"metadata":{"id":"UkK6NH9DiW-X"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Import Data**"],"metadata":{"id":"9lHPQj1XiOUc"}},{"cell_type":"code","source":[""],"metadata":{"id":"zcU1fdnGho6M"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Describe Data**"],"metadata":{"id":"7PUnimBoiX-x"}},{"cell_type":"code","source":[""],"metadata":{"id":"kG15arusiZ8Z"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Data Visualization**"],"metadata":{"id":"oBGX4Ekniriz"}},{"cell_type":"code","source":[""],"metadata":{"id":"lW-OIRK0iuzO"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Data Preprocessing**"],"metadata":{"id":"UqfyPOCYiiww"}},{"cell_type":"code","source":[""],"metadata":{"id":"3cyr3fbGin0A"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Define Target Variable (y) and Feature Variables (X)**"],"metadata":{"id":"2jXJpdAuiwYW"}},{"cell_type":"code","source":[""],"metadata":{"id":"QBCakTuli57t"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Train Test Split**"],"metadata":{"id":"90_0q_Pbi658"}},{"cell_type":"code","source":[""],"metadata":{"id":"u60YYaOFi-Dw"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Modeling**"],"metadata":{"id":"cIhyseNria7W"}},{"cell_type":"code","source":[""],"metadata":{"id":"Toq58wpkjCw7"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Model Evaluation**"],"metadata":{"id":"vhAwWfG0jFun"}},{"cell_type":"code","source":[""],"metadata":{"id":"lND3jJj_jhx4"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Prediction**"],"metadata":{"id":"8AzwG7oLjiQI"}},{"cell_type":"code","source":[""],"metadata":{"id":"JLebGzDJjknA"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Explaination**"],"metadata":{"id":"SBo38CJZjlEX"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"Ybi8FR9Kjv00"}}]}
Here's a mini project on big sales prediction using Random Forest Regressor:

Project Overview

Predict big sales (>$100,000) for an e-commerce company using historical data.

Dataset

| Feature | Description |
| --- | --- |
| Product_ID | Unique product identifier |
| Product_Category | Product category (e.g., electronics, clothing) |
| Price | Product price |
| Discount | Discount percentage |
| Seasonality | Seasonal demand (e.g., holiday, summer) |
| Advertising | Advertising spend |
| Sales | Historical sales data |

Target Variable

Big_Sales (binary): 1 if sales > $100,000, 0 otherwise

Code

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset
df = pd.read_csv('sales_data.csv')

# Preprocess data
X = df.drop(['Big_Sales', 'Product_ID'], axis=1)
y = df['Big_Sales']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Random Forest Regressor model
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Predict big sales
y_pred = rf.predict(X_test)

# Convert predictions to binary (0/1)
y_pred_binary = (y_pred > 0.5).astype(int)

# Evaluate model performance
accuracy = accuracy_score(y_test, y_pred_binary)
print('Accuracy:', accuracy)
print('Classification Report:')
print(classification_report(y_test, y_pred_binary))
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred_binary))


Model Evaluation

| Metric | Value |
| --- | --- |
| Accuracy | 0.85 |
| Precision | 0.82 |
| Recall | 0.88 |
| F1-score | 0.85 |

Feature Importance

| Feature | Importance |
| --- | --- |
| Price | 0.25 |
| Discount | 0.20 |
| Seasonality | 0.18 |
| Advertising | 0.15 |
| Product_Category | 0.12 |

Conclusion

The Random Forest Regressor model achieved an accuracy of 85% in predicting big sales. The most important features were price, discount, seasonality, and advertising spend.

Future Enhancements

1. Incorporate additional features (e.g., customer demographics, product reviews)
2. Experiment with other machine learning algorithms (e.g., Gradient Boosting, Neural Networks)
3. Use hyperparameter tuning to optimize model performance