GitHub - yoismail/yinc_ecommerce_case_study: This case study presents the design and implementation of a production-style ETL pipeline that transforms raw e-commerce data into a structured, analytics-ready PostgreSQL warehouse. Using Python, Pandas, and SQL, the project focuses on data cleaning, transformation, and relational modeling.

📦 YINC E-Commerce Case Study - ETL Pipeline & PostgreSQL Data Warehouse

This project implements a complete ETL (Extract, Transform, Load) pipeline for an e-commerce dataset. It extracts raw CSV data, cleans and transforms it using Pandas, models it into relational tables, and loads it into a PostgreSQL database under a dedicated schema (yinc).

The pipeline is modular, reproducible, and structured to reflect real-world data engineering workflows.

🚀 Project Overview

This case study demonstrates:

Data extraction from raw CSV files
Data cleaning and transformation using Pandas
Creation of relational tables
Automated table creation in PostgreSQL
Loading cleaned data into relational tables
Environment-variable-based DB connection handling
A fully reproducible ETL workflow

🧱 Final Data Model

The final dataset is organized into 5 tables:

customer
product
shipping
order
payment_method

📁 Project Structure

yinc_ecommerce_case_study/
│
├── dataset/
│   ├── raw_data/
│   │   └── yinc_ecommerce.csv
│   └── cleaned_data/
│       ├── customer.csv
│       ├── product.csv
│       ├── shipping.csv
│       ├── order.csv
│       └── payment_method.csv
│
├── yinc_etl.ipynb
├── .gitignore
└── README.md

🧪 1. Extraction Layer

The raw dataset is loaded using Pandas:

import pandas as pd

path = r'dataset\\raw_data\\yinc_ecommerce.csv'
yinc_df = pd.read_csv(path)

🔧 Column Standardization

yinc_df.columns = (
    yinc_df.columns
    .str.strip()
    .str.lower()
    .str.replace(" ", "_")
)

🧹 2. Data Cleaning & Transformation

Key Steps

Dropping rows missing critical identifiers
Converting order_date to datetime
Generating synthetic email_address
Splitting dataset into normalized tables

Example Transformation

customer_df = yinc_df[['customer_id','customer_name', 'email', 'phone_number']] \
    .drop_duplicates().reset_index(drop=True)

del customer_df["email"]

customer_df['email_address'] = (
    customer_df['customer_name']
    .str.lower()
    .str.replace(" ","_")
    + '@' + customer_df['customer_id'] + '.com'
)

🗂️ 3. Table Creation (PostgreSQL)

🔌 Database Connection

from dotenv import load_dotenv
import os
import psycopg2

def get_db_connection():
    load_dotenv()
    db_url = os.getenv("DB_URL")
    return psycopg2.connect(db_url)

🏗️ Schema & Tables

CREATE SCHEMA IF NOT EXISTS yinc;

CREATE TABLE IF NOT EXISTS yinc.customer (...);
CREATE TABLE IF NOT EXISTS yinc.product (...);
CREATE TABLE IF NOT EXISTS yinc.shipping (...);
CREATE TABLE IF NOT EXISTS yinc.order (...);
CREATE TABLE IF NOT EXISTS yinc.payment_method (...);

📥 4. Data Loading

def load_data_from_csv(csv_path):
    conn = get_db_connection()
    cursor = conn.cursor()

    cursor.execute(
        "INSERT INTO yinc.customer (...) VALUES (%s, %s, %s, %s)",
        row
    )

    conn.commit()
    cursor.close()
    conn.close()

🔁 Tables Loaded

customer.csv
product.csv
shipping.csv
order.csv
payment_method.csv

🔐 Environment Variables

DB_URL=postgresql://username:password@localhost:5432/database

⚠️ Ensure .env is in .gitignore

🛠️ Requirements

pip install pandas numpy psycopg2 python-dotenv

▶️ Running the Pipeline

Place raw data in:

dataset/raw_data/

Run:

yinc_etl.ipynb

Output:

Cleaned CSVs
PostgreSQL tables
Loaded data

📊 Final Output

yinc.customer
yinc.product
yinc.shipping
yinc.order
yinc.payment_method

🧑‍💻 Author

Yomi Ismail Data Engineer & Product Operations Specialist

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
dataset		dataset
sql		sql
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
data_model-Page-2.drawio.png		data_model-Page-2.drawio.png
data_model.drawio		data_model.drawio
project_structure.png		project_structure.png
yinc_etl.ipynb		yinc_etl.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📦 YINC E-Commerce Case Study - ETL Pipeline & PostgreSQL Data Warehouse

🚀 Project Overview

🧱 Final Data Model

📁 Project Structure

🧪 1. Extraction Layer

🔧 Column Standardization

🧹 2. Data Cleaning & Transformation

Key Steps

Example Transformation

🗂️ 3. Table Creation (PostgreSQL)

🔌 Database Connection

🏗️ Schema & Tables

📥 4. Data Loading

🔁 Tables Loaded

🔐 Environment Variables

🛠️ Requirements

▶️ Running the Pipeline

📊 Final Output

🧑‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📦 YINC E-Commerce Case Study - ETL Pipeline & PostgreSQL Data Warehouse

🚀 Project Overview

🧱 Final Data Model

📁 Project Structure

🧪 1. Extraction Layer

🔧 Column Standardization

🧹 2. Data Cleaning & Transformation

Key Steps

Example Transformation

🗂️ 3. Table Creation (PostgreSQL)

🔌 Database Connection

🏗️ Schema & Tables

📥 4. Data Loading

🔁 Tables Loaded

🔐 Environment Variables

🛠️ Requirements

▶️ Running the Pipeline

📊 Final Output

🧑‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages