Pinned Loading
-
Dockerized_Airflow
Dockerized_Airflow PublicAn ETL pipeline project to transform daily/monthly sales as summary, written in Python, orchestrated with Apache Airflow in a customized Docker container (custom docker build with specific packages…
Python
-
pipeline_from_Transformers_library_NLP
pipeline_from_Transformers_library_NLP PublicPipeline is a tool from the Transformers library, a popular Natural Language processing library that consists of more than 32 pre-trained models with 100+ language.
Jupyter Notebook
-
Simulated_data_with_noise
Simulated_data_with_noise PublicSimulated data (or fake data) are non-realistic data that are generated to test tools for its features and performances, when real-world data isn't suitable or unavailable. Generating fake data hel…
Jupyter Notebook
-
Titanic_disaster_prediction
Titanic_disaster_prediction PublicA Kaggle competition where the goal was to predict the survival of passengers based on gender. It's not much, but it's honest work.
Jupyter Notebook
-
api-fetch-etl-pipeline
api-fetch-etl-pipeline PublicThis project fetches data from an API endpoint and feeds into a data pipeline built to feed a data model
Shell 1
-
pyspark-etl-customer-sales
pyspark-etl-customer-sales PublicPySpark-based ETL pipeline that extracts transaction data from a MySQL database, cleans and transforms it, aggregates monthly sales per customer, and writes the processed data to an S3 bucket in Pa…
Python 1
If the problem persists, check the GitHub status page or contact support.