GitHub - Thabo2023/Electronics-Sale-Workflow: This workflow uses SQL, Python, and machine learning to derive actionable insights from sales data.

A scenario I created "Mr Pilusa just had his first month (July) of electronics sales, but he is old school, so he recorded the data in a book, he need it to be transformed into a SQL database, do EDA to see how the data looks like and see who are his frequent customers, do some machine learning predictions to forecast the upcoming month sales in order to have enough inventory"

Setting Up the Database SQL Database Creation : A SQLite database (sales.db) is created to store customer, product, and sales data. Three tables are defined: Customers: Stores customer details (e.g., name, email, region). Products: Stores product details (e.g., product name, category, price). Sales: Tracks sales transactions, linking customers and products via foreign keys. Data Insertion : Sample data is inserted into the Customers, Products, and Sales tables using executemany().
Data Exploration and Visualization Data Retrieval : The data from the SQLite database is loaded into Pandas DataFrames for analysis. Visualizations : Sales by Product : A bar chart shows the total units sold for each product. Revenue by Region : A bar chart displays total revenue generated per region.
Exploratory Data Analysis (EDA) Merging Data : The Sales, Customers, and Products tables are merged into a single DataFrame for analysis. Additional features like unit_price, month, and day are calculated. Key Metrics : Total revenue, average transaction value, total units sold, top-selling products, and revenue by region are computed and displayed.
Sales Forecasting Time Series Analysis : Daily sales data is aggregated, and missing dates are filled with zero sales. Two models are used for forecasting: ARIMA : Predicts future sales based on past trends. Prophet : Captures seasonality and trends in the data. Forecasts for the next 3 days are generated and compared visually. Performance metrics (MAE, MSE, RMSE) are calculated for both models.
Customer Segmentation Clustering : Customer features (e.g., total spend, average transaction value, purchase frequency) are extracted and scaled. K-means clustering is applied to group customers into segments (e.g., high-spending vs. low-spending). PCA is used to visualize clusters in 2D space.
Revenue Prediction Random Forest Model : A Random Forest regressor is trained to predict revenue based on features like product_id, customer_id, and quantity. Feature importance is analyzed to understand which factors drive revenue predictions. The model's performance is evaluated using metrics like MSE, RMSE, and MAE. Actual vs. predicted revenue is visualized over time.
Market Basket Analysis Association Rules : An attempt is made to analyze relationships between products purchased together. However, since each transaction contains only one product, the analysis is not meaningful and results in an empty output.
Saving the Model The trained Random Forest model is saved to a file (random_forest_revenue_model.pkl) using Python's pickle module for future deployment.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Elecctronics_Sales_Model_Creation.ipynb		Elecctronics_Sales_Model_Creation.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Thabo2023/Electronics-Sale-Workflow

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages