This project implements a data ingestion pipeline for car rental data, utilizing SCD2 (Slowly Changing Dimension Type 2) merge on the customer dimension table in Snowflake. The pipeline leverages Python, PySpark, GCP Dataproc, Airflow, and Snowflake.
- Python π
- PySpark π
- GCP Dataproc βοΈ
- Airflow
βοΈ - Snowflake βοΈ
- SCD2 Implementation: Effectively handles changes in customer data over time.
- Data Ingestion: Reads data from Google Cloud Storage and loads it into Snowflake tables βοΈ.
- Data Processing: Utilizes PySpark for efficient data transformations and aggregations.
- Orchestration: Airflow schedules and manages the pipeline for automation.
- Scalability: Leverages GCP Dataproc for scalable data processing.

