Sample Mode Alpha #11247

QMalcolm · 2025-01-28T06:24:08Z

Resolves #11227
Resolves #11230
Resolves #11231
Resolves #11248
Resolves #11252
Resolves #11254
Resolves #11258

Problem

Solution

Checklist

I have read the contributing guide and understand what's expected of me.
I have run this code in development, and it appears to resolve the stated issue.
This PR includes tests, or tests are not required or relevant for this PR.
This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
This PR includes type annotations for new and modified functions.

codecov · 2025-01-28T06:30:23Z

Codecov Report

Attention: Patch coverage is 96.80851% with 3 lines in your changes missing coverage. Please review.

Project coverage is 88.96%. Comparing base (fdabe95) to head (1ef8b3f).
Report is 1 commits behind head on main.

❌ Your patch status has failed because the patch coverage (79.78%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #11247      +/-   ##
==========================================
+ Coverage   88.93%   88.96%   +0.02%     
==========================================
  Files         187      189       +2     
  Lines       24072    24161      +89     
==========================================
+ Hits        21408    21494      +86     
- Misses       2664     2667       +3

Flag	Coverage Δ
integration	`86.30% <79.78%> (-0.01%)`	⬇️
unit	`62.53% <87.23%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Unit Tests	`62.53% <87.23%> (+0.08%)`	⬆️
Integration Tests	`86.30% <79.78%> (-0.01%)`	⬇️

…n for microbatch models Upon the initial implementation of microbatch models, the the `start` for a batch was _optional_. However, in c3d87b8 they became guaranteed. Thus the if statement guarding when `start/end` isn't present for microbatch models was no longer actually doing anything. Hence, the if statement was safe to remove.

This is temporary as a POC. In the end, sample mode can't depend on the arguments `--event-time-start/end` and will need to be split into their own CLI args / project config, something like `--sample-window`. The issue with using `--event-time-start/end` is that if people set those in the project configs, then their microbatch models would _always_ run with those values even outside of sample mode. Despite that, this is a useful checkpoint even though it will go away.

…e-start/end` Using `--event-time-start/end` for sample mode was conflicting with microbatch models when _not_ running in sample mode. We will have to do _slightly_ more work to plumb this new way of specifying sample time to microbatch models.

…dule This is mostly symbolic. We are going to be adding some utilities for "event_time" type things, which will all live in the `event_time` submodule. Additionally we plan to refactor `/incremental/materializations/microbatch.py` into the sub module as well.

The `MicrobatchBuilder.offset_timestamp` _truncates_ the timestamp before offsetting it. We don't want to do that, we want to offset the "raw" timestamp. We could have split renamed the microbatch builder function name to `truncate_and_offset_timestamp` and separated the offset logic into a separate abstract function. However, the offset logic in the MicrobatchBuilder context depends on the truncation. We might later on be able to refactor the Microbatch provided function by instead truncating _after_ offsetting instead of before. But that is out of scope for this initial work, and we should instead revisit it later.

The previous commit began using a submodule of the dateutil builtin python library. We weren't previously using this library, and thus didn't need the type stubs for it. But now that we do use it, we need to have the type stubs during development.

In most cases people will want to set "relative" sample windows, i.e. "3 days" to sample the last three days. However, there are some cases where people will want to "specific" sample windows for some chunk of historic time, i.e. `{'start': '2024-01-01', 'end': '2024-01-31'}`.

… changes

This was necessary because these aren't _always_ available. I had expected to need to do this after putting the `sample` flag behind an environment variable (which I haven't done yet). However, we needed to add the guards sooner because the `render` logic is called multiple times throughout the dbt process, and earlier on the flags aren't available.

…ODE` At this point sample mode is _alpha_ and should not be depended upon. To make this crystal clear we've gated the functionality behind an environment variable. We'll likely remove this gate in the coming month.

I had updated the `later_input_model.sql` to be easier to test with. However, I didn't correspondingly update the inital `input_model.sql` to match.

…env var isn't present Previously microbatch was creating the _right_ number of batches when: 1. sample mode _wasn't_ being used 2. sample mode _was_ being used AND the env var was present Unfortunately sample mode _wasn't_ creating the right number of batches when: 3. sample mode _was_ being used AND the env var _wasn't_ present. In case (3) sample mode shouldn't be run. Unfortunately we weren't gating sample mode by the environment variable during batch creation. This lead to a situtation where in creating batches it was using sample mode but in the rendering of refs it _wasn't_ using sample mode. Putting it in an inbetween state... This commit fixes that issue. Additionally of note, we currently have duplicate sample mode gating logic in the batch creation as well as in the rendering of refs. We should probably consolidate this logic into a singular importable function, that way any future changes of how sample mode is gated is easier to implement.

QMalcolm · 2025-02-03T23:08:26Z

core/dbt/cli/params.py

+    envvar="DBT_SAMPLE",
+    help="Run in sample mode, creating only samples of models where possible",
+    default=False,
+    is_flag=True,


Hide, and later unhide. I.e. hidden=True

For clarification, we want to hide these while we're in alpha / gated behind an environment variable. Once we remove the environment variable, we'll unhide these

MichelleArk

LGTM for an alpha! The gating looks great, but lets add hidden on the CLI options for good measure for now.

Test coverage looks super solid, especially around incremental + microbatch sample handling.

Artifact change test failure looks unrelated, to do with formatting in that/this repo:

diff --git a/dbt/run-results/v6.json b/dbt/run-results/v6.json
index 76eee1f..53b6491 100644
--- a/dbt/run-results/v6.json
+++ b/dbt/run-results/v6.json
@@ -262,4 +262,4 @@
     "elapsed_time"
   ],
   "$id": "https://schemas.getdbt.com/dbt/run-results/v6.json"
-}
+}
\ No newline at end of file

We are doing this _temporarily_ while sample mode as a feature is in alpha/beta and locked behind an environment variable. When we remove the environment variable we should also unhide these.

QMalcolm added the Skip Changelog Skips GHA to check for changelog file label Jan 28, 2025

cla-bot bot added the cla:yes label Jan 28, 2025

QMalcolm added the artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking label Jan 28, 2025

QMalcolm force-pushed the qmalcolm--sample-mode-alpha branch 3 times, most recently from 7b8775e to 8bb42e2 Compare January 29, 2025 18:23

QMalcolm added 14 commits January 31, 2025 13:03

Add --sample flag to run command

1faa8b7

Begin supporting microbatch models in sample mode

cf5bfd6

Move parsing logic of SampleWindowType to SampleWindow

a8cbec2

Fix tests of BaseResolver.resolve_event_time_filter for sample mode…

10674f7

… changes

Add --no-sample as it's necessary for retry

7e86080

QMalcolm force-pushed the qmalcolm--sample-mode-alpha branch from 8bb42e2 to 1811754 Compare January 31, 2025 19:57

Add sample mode tests for incremental models

719b550

QMalcolm force-pushed the qmalcolm--sample-mode-alpha branch from 9678427 to 01ae64f Compare February 2, 2025 20:01

Add changie doc for sample mode initial implementation

01ae64f

QMalcolm marked this pull request as ready for review February 2, 2025 20:01

QMalcolm requested a review from a team as a code owner February 2, 2025 20:01

QMalcolm removed the Skip Changelog Skips GHA to check for changelog file label Feb 2, 2025

QMalcolm added 2 commits February 3, 2025 10:21

Fixup sample mode functional tests

5c51a6f

I had updated the `later_input_model.sql` to be easier to test with. However, I didn't correspondingly update the inital `input_model.sql` to match.

QMalcolm commented Feb 3, 2025

View reviewed changes

MichelleArk approved these changes Feb 3, 2025

View reviewed changes

QMalcolm added 2 commits February 3, 2025 17:31

Correct comment in SampleWindow post serialization method

6d001d7

Hide CLI sample mode options

1ef8b3f

We are doing this _temporarily_ while sample mode as a feature is in alpha/beta and locked behind an environment variable. When we remove the environment variable we should also unhide these.

QMalcolm merged commit 5f873da into main Feb 4, 2025
54 of 56 checks passed

QMalcolm deleted the qmalcolm--sample-mode-alpha branch February 4, 2025 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample Mode Alpha #11247

Sample Mode Alpha #11247

QMalcolm commented Jan 28, 2025 •

edited

Loading

codecov bot commented Jan 28, 2025 •

edited

Loading

QMalcolm Feb 3, 2025

QMalcolm Feb 3, 2025

MichelleArk left a comment

Sample Mode Alpha #11247

Sample Mode Alpha #11247

Conversation

QMalcolm commented Jan 28, 2025 • edited Loading

Problem

Solution

Checklist

codecov bot commented Jan 28, 2025 • edited Loading

Codecov Report

QMalcolm Feb 3, 2025

Choose a reason for hiding this comment

QMalcolm Feb 3, 2025

Choose a reason for hiding this comment

MichelleArk left a comment

Choose a reason for hiding this comment

QMalcolm commented Jan 28, 2025 •

edited

Loading

codecov bot commented Jan 28, 2025 •

edited

Loading