This module handles the Extract, Transform, Load (ETL) processes for the Arkaid project, managing data flow between different databases and creating materialized views for efficient analytics.
The ETL module is responsible for:
- Extracting data from multiple sources
- Transforming data into standardized formats
- Loading data into appropriate databases
- Creating and maintaining materialized views
- Managing database connections and configurations
db_connection.py
: Manages database connectionsdb_config.yaml
: Configuration file for database connections.env.example
: Template for environment variables
etl_steam_games.py
: ETL process for Steam games dataetl_epic_games.py
: ETL process for Epic games dataetl_steam_players.py
: ETL process for Steam player dataetl_epic_players.py
: ETL process for Epic player dataetl_content_creators.py
: ETL process for content creator dataetl_developers.py
: ETL process for developer dataetl_modders.py
: ETL process for modder dataetl_publishers.py
: ETL process for publisher data
mv_games_warehouse.py
: Manages game-related materialized viewsmv_players_warehouse.py
: Manages player-related materialized views
schema_matcher.py
: Handles schema matching between different data sourcesother_schema_matcher.py
: Additional schema matching functionalitycsv_data_sources.py
: Manages CSV data source configurations
ETL/
├── data/ # Data files directory
├── mappings/ # Schema mapping configurations
├── .env.example # Environment variables template
├── db_config.yaml # Database configuration
├── data_types.txt # Data type definitions
└── ETL process files # Individual ETL scripts
- Copy
.env.example
to.env
and fill in database credentials:
cp .env.example .env
- Update
db_config.yaml
with appropriate database configurations:
databases:
- name: DB1
host: your_host
port: "5432"
dbname: your_db
username: your_username
password: your_password
- For Steam data:
python etl_steam_games.py
python etl_steam_players.py
- For Epic data:
python etl_epic_games.py
python etl_epic_players.py
- For other entities:
python etl_content_creators.py
python etl_developers.py
python etl_modders.py
python etl_publishers.py
- Create/update game-related views:
python mv_games_warehouse.py
- Create/update player-related views:
python mv_players_warehouse.py
The ETL processes work with three main databases:
- DB1: Epic Games data
- DB2: Steam data
- DB3: Materialized views and consolidated data
- Python 3.x
- psycopg2
- pandas
- PyYAML
- python-dotenv
- SQLAlchemy
- ETL processes include error handling and logging
- Materialized views are automatically refreshed when source data changes
- Schema matching ensures data consistency across different sources
- All processes are designed to be idempotent