πππππ’π₯ππ²π§π ππ: ππ-ππ¨π°ππ«ππ πππ₯ππ¬ π π¨π«ππππ¬ππ’π§π & ππ ππ§π¬π’π π‘ππ¬
RetailSync AI is a cutting-edge data science project that forecasts retail sales and delivers actionable business intelligence (BI) using a Kaggle dataset (100k records, India stores, 2019-2023). Built with Python, it combines machine learning (ML) models and large language model (LLM)-powered queries to empower retail decision-making. The project features exploratory data analysis (EDA), model training (Linear Regression, XGBoost, Random Forest, ARIMA, LSTM), and a dynamic Streamlit dashboard for real-time predictions and insights.
This repository showcases end-to-end skillsβdata preprocessing, feature engineering, ML pipelines, and interactive deploymentβperfect for retail analytics innovation. A huge thank you to Abu Humza Khan for the Kaggle dataset (Store Sales Data), which fueled this project!
- Exploratory Data Analysis (EDA):
- Cleans and preprocesses retail sales data.
- Engineers features like Customer Lifetime Value (CLV) and Discount Impact.
- Visualizes trends, customer segments, and discount-profit correlations.
- Machine Learning Models:
- Trains Linear Regression, XGBoost, Random Forest (RΒ² ~0.5β0.7), ARIMA, and LSTM (RΒ² ~0.7).
- Evaluates with MSE, RΒ², and visualizes feature importance and forecasts.
- Interactive Streamlit Dashboard:
- ML Predictions: Sliders for Quantity, Discount, and more to forecast sales.
- BI Insights: LLM-powered queries (Mistral-7B via OpenRouter) like βTop 5 sales in East.β
- Visualizations: Dynamic Altair charts for top metrics and predictions.
- Deployment: Runs locally or via Google Colab with Ngrok for public access.
RetailSync-AI/
βββ data/
β βββ pp_df_data.csv # Preprocessed retail sales data
βββ notebooks/
β βββ pp_fe_eda.ipynb # Data preprocessing, feature engineering, EDA
β βββ ml_model_training.ipynb # Train and save ML models
β βββ llm_ml_integrated_dashboard.ipynb # Build and launch Streamlit dashboard
βββ models/
β βββ linear_regression_model.pkl # Trained Linear Regression model
β βββ scaler_lr.pkl # Scaler for Linear Regression
β βββ xgboost_model.pkl # Trained XGBoost model
β βββ scaler_xgb.pkl # Scaler for XGBoost
β βββ random_forest_model.pkl # Trained Random Forest model
β βββ arima_model.pkl # Trained ARIMA model
β βββ lstm_model.h5 # Trained LSTM model
βββ app.py # Streamlit dashboard code
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
Install Dependencies:
pip install -r requirements.txt
- Python: 3.8+
- Dependencies: Listed in
requirements.txt - Accounts:
- OpenRouter for LLM API key (free tier available).
- Ngrok for public URL (free account sufficient).
- Dataset: Kaggle retail sales data (provided as
pp_df_data.csvafter preprocessing).
Create a .env file or set secrets in Colab:
OPENROUTER_API_KEY=your-openrouter-key
NGROK_TOKEN=your-ngrok-token
- Clone the Repository:
git clone https://github.com/your-username/Retail-Sales-Prediction-Dashboard.git cd Retail-Sales-Prediction-Dashboard - Launch Dashboard:
streamlit run app.py
