Skip to content

Dev-Parmar17/End-To-End-CAR-RENTAL-PIPELINE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš—πŸ’¨ Car Rental Batch Data Ingestion with SCD2 Merge in Snowflake ❄

Overview

This project implements a data ingestion pipeline for car rental data, utilizing SCD2 (Slowly Changing Dimension Type 2) merge on the customer dimension table in Snowflake. The pipeline leverages Python, PySpark, GCP Dataproc, Airflow, and Snowflake.

Architecture Diagram

image

TECH STACK

  • Python 🐍
  • PySpark πŸš€
  • GCP Dataproc ☁️
  • Airflow ✈️
  • Snowflake ❄️

Key Features

  • SCD2 Implementation: Effectively handles changes in customer data over time.
  • Data Ingestion: Reads data from Google Cloud Storage and loads it into Snowflake tables ❄️.
  • Data Processing: Utilizes PySpark for efficient data transformations and aggregations.
  • Orchestration: Airflow schedules and manages the pipeline for automation.
  • Scalability: Leverages GCP Dataproc for scalable data processing.

Airflow DAG Structure

Image description

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages