Serverless ETL on AWS using Step Functions, Lambda, Glue, and CloudFormation

ETL workflow to grab (two) datasets from an S3 bucket upon uploading, apply transformations to them separalely in parallel, and join the transformed versions after all the separate transformation processes complete.

Steps:

Clone this repo into an environment with AWS CLI.
Make sure user hass full access to CloudFormation, Creating IAM Roles, Step Functions, Lambda, Glue, DynamoDB, S3, CloudWatch and CloudWatch Logs.
Create and activate virtual environment, then run make install.
Update the resource name "sfn_activity_arn" in ./lambda/gluerunner/gluerunner-config.json to include the user region and account ID in the correct format: arn:aws:states:<region-name>:<account-ID>:activity:GlueRunnerActivity.
Run make build.
After build success, execute state machine from either the Step Functions console or the CLI. Make sure the data S3 bucket does not exist prior to this execution.
To remove cloud resources after ETL run completion and backing up of any necessary outputs, first delete the data S3 bucket and then run make delete_stacks.

References:

https://aws.amazon.com/blogs/big-data/orchestrate-multiple-etl-jobs-using-aws-step-functions-and-aws-lambda/
https://github.com/aws-samples/aws-etl-orchestrator: codes from the tutorial repo modified to exclude Athena and use python3.8 instead of 2.7

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
cloudformation		cloudformation
data		data
glue-scripts		glue-scripts
lambda		lambda
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.py		build.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Serverless ETL on AWS using Step Functions, Lambda, Glue, and CloudFormation

Steps:

References:

About

Uh oh!

Releases

Packages

Languages

License

biswas/aws-serverless-etl

Folders and files

Latest commit

History

Repository files navigation

Serverless ETL on AWS using Step Functions, Lambda, Glue, and CloudFormation

Steps:

References:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages