GitHub - robbfox/totesys

Data-Dynamics ETL Platform

A data engineering application built in python that pipes structured data from a database into a data warehouse. The primary data source for the project is a moderately complex (but not very large) database called 'totesys' which is meant to simulate the back end data of a commercial application. Data is inserted and updated into this database several times a day. After extraction, the process transforms the data to parquet files formatted for easy insertion to a data warehouse, using a second S3 bucket and finally uses the data to populate a star-schema data warehouse.

Authors:

https://github.com/jampen

https://github.com/SpinyKarma

https://github.com/JK-A2023

https://github.com/abumohamedfanan

https://github.com/robbfox

Deployment

To deploy this project:

Clone:

  gh repo clone jampen/data-dynamics-de-project
  cd data-dynamics-de-project

Install requirements:

make create-environment

Move to Terraform folder and start:

cd terraform
terraform init

Action terraform:

terraform plan
terraform apply

Tech Stack

Languages python postgresql

Data Tools pg8000 boto3 pyarrow pandas

Cloud Platform Amazon Web Services

Data Pipelines

Three main processes:

Ingestion
Uses AWS Eventbridge to action pg8000 to read from the Postgres database every three minutes, and boto3 to transfer data into an s3 bucket as .csv files

Transform
Takes the data from the .csv files as pandas dataframes, formats the data according to warehouse table-specifications, and adds this as .parquet files to a second S3 bucket

Load
Reads the .parquet files, and inserts the data into a star-schema data warehouse.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
lambda_layer/python		lambda_layer/python
lambda_layer_2/python		lambda_layer_2/python
src		src
terraform		terraform
test		test
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Dynamics ETL Platform

Deployment

Tech Stack

Data Pipelines

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data-Dynamics ETL Platform

Deployment

Tech Stack

Data Pipelines

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages