Sabab080

Sabab Sabab080

Data Engineer with a passion for AI/ML

Pinned Loading

Dockerized_Airflow Dockerized_Airflow Public

An ETL pipeline project to transform daily/monthly sales as summary, written in Python, orchestrated with Apache Airflow in a customized Docker container (custom docker build with specific packages…

Python
pipeline_from_Transformers_library_NLP pipeline_from_Transformers_library_NLP Public

Pipeline is a tool from the Transformers library, a popular Natural Language processing library that consists of more than 32 pre-trained models with 100+ language.

Jupyter Notebook
Simulated_data_with_noise Simulated_data_with_noise Public

Simulated data (or fake data) are non-realistic data that are generated to test tools for its features and performances, when real-world data isn't suitable or unavailable. Generating fake data hel…

Jupyter Notebook
Titanic_disaster_prediction Titanic_disaster_prediction Public

A Kaggle competition where the goal was to predict the survival of passengers based on gender. It's not much, but it's honest work.

Jupyter Notebook
api-fetch-etl-pipeline api-fetch-etl-pipeline Public

This project fetches data from an API endpoint and feeds into a data pipeline built to feed a data model

Shell 1
pyspark-etl-customer-sales pyspark-etl-customer-sales Public

PySpark-based ETL pipeline that extracts transaction data from a MySQL database, cleans and transforms it, aggregates monthly sales per customer, and writes the processed data to an S3 bucket in Pa…

Python 1