Draft
Conversation
7 tasks
srnnkls
reviewed
Dec 1, 2025
| @@ -0,0 +1,56 @@ | |||
| -- Create weekly snapshots per store | |||
Collaborator
There was a problem hiding this comment.
CREATE TABLE PREPARED.weekly_stores AS
WITH store_activity AS (
SELECT
s.id as store_id,
s.name as store_name,
s.opened_at,
DATE_TRUNC('week', s.opened_at) + INTERVAL '7 days' as first_full_week,
MIN(o.ordered_at) as first_order_date,
MAX(o.ordered_at) as last_order_date,
DATE_TRUNC('week', MIN(o.ordered_at)) as first_order_week,
DATE_TRUNC('week', MAX(o.ordered_at)) as last_order_week
FROM RAW.raw_stores s
LEFT JOIN RAW.raw_orders o ON o.store_id = s.id
GROUP BY s.id, s.name, s.opened_at
),
all_weeks AS (
SELECT DISTINCT
DATE_TRUNC('week', ordered_at) as reference_date
FROM RAW.raw_orders
WHERE ordered_at IS NOT NULL
),
store_weeks AS (
SELECT
sa.store_id,
sa.store_name,
w.reference_date,
sa.opened_at,
sa.first_full_week,
sa.first_order_week,
sa.last_order_week
FROM store_activity sa
CROSS JOIN all_weeks w
WHERE w.reference_date >= sa.opened_at
AND w.reference_date < sa.last_order_week
)
SELECT
store_id,
store_name,
reference_date,
EXTRACT(year FROM reference_date) as year,
EXTRACT(month FROM reference_date) as month,
EXTRACT(week FROM reference_date) as week_number,
DATEDIFF('day', opened_at, reference_date) as days_since_open,
reference_date >= first_full_week as is_full_week_after_opening,
first_order_week IS NOT NULL
AND reference_date >= first_order_week
AND reference_date < last_order_week as has_order_activity,
DATEDIFF('day', opened_at, reference_date) >= 7 as has_min_history
FROM store_weeks
ORDER BY reference_date, store_id;…sion management - Added bootstrap functionality to ensure the existence of Snowflake warehouse and database. - Created SnowflakeSettings class for managing authentication and connection settings. - Implemented session management for Snowflake using Snowpark. - Developed SQL loading utilities for dynamic SQL execution. - Added SQL scripts for creating schemas, tables, and stages, as well as data ingestion processes. - Prepared a script to load and prepare Jaffle Shop data for integration with getML Feature Store.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new Snowflake integration module for the Jaffle Shop dataset. It provides a robust, idempotent workflow for bootstrapping Snowflake infrastructure, ingesting CSV data from S3, and preparing weekly sales forecasting data for use with getML. The implementation features modular Python scripts, externalized SQL queries, and comprehensive logging and error handling. The workflow is automated via a new GitHub Actions CI pipeline.
Infrastructure and Workflow Automation
.github/workflows/snowflake-test.yml) to automate Python linting, formatting, type checking, and testing for the Snowflake integration, including coverage reporting and support for multiple Python versions.Snowflake Infrastructure Bootstrapping
bootstrap.pyto create Snowflake warehouses and databases if they do not exist, using idempotent SQL and externalized queries (create_warehouse.sql,create_database.sql). [1] [2] [3]Data Ingestion Pipeline
data/ingestion.pyto ingest Jaffle Shop CSV data from S3 into Snowflake's RAW schema using external stages and native COPY INTO commands, with transaction management and error handling. [1] [2] [3]Data Preparation for Feature Store
data/preparation.pyto create weekly sales population tables per store, calculate forecasting targets, perform schema validation, and run data quality checks, leveraging externalized SQL for maintainability. [1] [2]Modular Utilities and API
data/__init__.pyand_sql_loader.pyto provide a clean API and internal SQL file loading utilities for the integration package. [1] [2]