This project focuses on building a robust data pipeline for Alkasba Store, an e-commerce platform specializing in herbal and plant-based products. The pipeline extracts, processes, and analyzes data to enhance sales and uncover valuable insights for the business.
- Data Extraction: Extracts data from various sources including APIs, CSV, JSON, and databases.
- Data Transformation: Merges data frames, removes duplicates, handles null values, and standardizes column names.
- Loading into Data Warehouse: Designs Fast Constellation Schema, creates data warehouse, defines tables, and establishes relationships.
- Power BI Report: Establishes DirectQuery connections, creates calculated columns, and develops three dashboards.
- Business Intelligence (Decisions): Analyzes sales and inventory data, optimizing warehousing for cost reduction.
- Orchestration with Airflow: Configures Airflow on Docker, defines DAG scripts, and schedules monthly DAG runs.
- Producer Kafka: Imports necessary libraries, configures Kafka producer, defines API URL, and continuously fetches and produces data.
- Consumer Kafka: Imports required libraries, configures Kafka consumer, sets up Elasticsearch, processes data, defines Elasticsearch mapping, and indexes data.
- Indexation in Elasticsearch: Data stored in
index_products
for analysis. - Kibana Dashboard: Comprehensive dashboard with visualizations for a thorough analysis.
- Product Recommendation API: Incorporates ALS machine learning model for personalized product recommendations.
- GDPR Compliance: Strict adherence to General Data Protection Regulation (GDPR) guidelines.
- Data Governance Strategy: Emphasizes compliance, protection against unauthorized access, regular data purging, transparency, data quality, and continuous improvement.
- Clone the repository.
- Set up Python environment using
requirements.txt
. - Configure Airflow, Kafka, and other dependencies.
- Run data pipeline workflows.
- Detailed usage instructions can be found in the respective workflow directories.
- Refer to documentation for API endpoints and data recommendations.
Contributions are welcome! Please follow our contribution guidelines.
This project is licensed under the MIT License.