Initial Snowflake integration by cyclux · Pull Request #49 · getml/getml-demo

cyclux · 2025-11-28T20:38:57Z

This pull request introduces a new Snowflake integration module for the Jaffle Shop dataset. It provides a robust, idempotent workflow for bootstrapping Snowflake infrastructure, ingesting CSV data from S3, and preparing weekly sales forecasting data for use with getML. The implementation features modular Python scripts, externalized SQL queries, and comprehensive logging and error handling. The workflow is automated via a new GitHub Actions CI pipeline.

Infrastructure and Workflow Automation

Added a new GitHub Actions workflow (.github/workflows/snowflake-test.yml) to automate Python linting, formatting, type checking, and testing for the Snowflake integration, including coverage reporting and support for multiple Python versions.

Snowflake Infrastructure Bootstrapping

Implemented bootstrap.py to create Snowflake warehouses and databases if they do not exist, using idempotent SQL and externalized queries (create_warehouse.sql, create_database.sql). [1] [2] [3]

Data Ingestion Pipeline

Developed data/ingestion.py to ingest Jaffle Shop CSV data from S3 into Snowflake's RAW schema using external stages and native COPY INTO commands, with transaction management and error handling. [1] [2] [3]

Data Preparation for Feature Store

Added data/preparation.py to create weekly sales population tables per store, calculate forecasting targets, perform schema validation, and run data quality checks, leveraging externalized SQL for maintainability. [1] [2]

Modular Utilities and API

Introduced data/__init__.py and _sql_loader.py to provide a clean API and internal SQL file loading utilities for the integration package. [1] [2]

srnnkls · 2025-12-01T14:12:45Z

@@ -0,0 +1,56 @@
+-- Create weekly snapshots per store


CREATE TABLE PREPARED.weekly_stores AS WITH store_activity AS ( SELECT s.id as store_id, s.name as store_name, s.opened_at, DATE_TRUNC('week', s.opened_at) + INTERVAL '7 days' as first_full_week, MIN(o.ordered_at) as first_order_date, MAX(o.ordered_at) as last_order_date, DATE_TRUNC('week', MIN(o.ordered_at)) as first_order_week, DATE_TRUNC('week', MAX(o.ordered_at)) as last_order_week FROM RAW.raw_stores s LEFT JOIN RAW.raw_orders o ON o.store_id = s.id GROUP BY s.id, s.name, s.opened_at ), all_weeks AS ( SELECT DISTINCT DATE_TRUNC('week', ordered_at) as reference_date FROM RAW.raw_orders WHERE ordered_at IS NOT NULL ), store_weeks AS ( SELECT sa.store_id, sa.store_name, w.reference_date, sa.opened_at, sa.first_full_week, sa.first_order_week, sa.last_order_week FROM store_activity sa CROSS JOIN all_weeks w WHERE w.reference_date >= sa.opened_at AND w.reference_date < sa.last_order_week ) SELECT store_id, store_name, reference_date, EXTRACT(year FROM reference_date) as year, EXTRACT(month FROM reference_date) as month, EXTRACT(week FROM reference_date) as week_number, DATEDIFF('day', opened_at, reference_date) as days_since_open, reference_date >= first_full_week as is_full_week_after_opening, first_order_week IS NOT NULL AND reference_date >= first_order_week AND reference_date < last_order_week as has_order_activity, DATEDIFF('day', opened_at, reference_date) >= 7 as has_min_history FROM store_weeks ORDER BY reference_date, store_id;

…sion management - Added bootstrap functionality to ensure the existence of Snowflake warehouse and database. - Created SnowflakeSettings class for managing authentication and connection settings. - Implemented session management for Snowflake using Snowpark. - Developed SQL loading utilities for dynamic SQL execution. - Added SQL scripts for creating schemas, tables, and stages, as well as data ingestion processes. - Prepared a script to load and prepare Jaffle Shop data for integration with getML Feature Store.

Initial Snowflake integration

02e8f7a

cyclux self-assigned this Nov 28, 2025

cyclux linked an issue Nov 28, 2025 that may be closed by this pull request

Build data preparation infrastructure for feature store notebooks #42

Open

7 tasks

srnnkls reviewed Dec 1, 2025

View reviewed changes

cyclux added 3 commits December 2, 2025 16:29

Major refactor

261ecfa

Refactor code structure for improved readability and maintainability

c8b7e35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Snowflake integration#49

Initial Snowflake integration#49
cyclux wants to merge 4 commits intomasterfrom
42-build-data-preparation-infrastructure-for-feature-store-notebooks

cyclux commented Nov 28, 2025

Uh oh!

srnnkls Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cyclux commented Nov 28, 2025

Uh oh!

srnnkls Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants