-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample Mode Alpha #11247
Sample Mode Alpha #11247
Conversation
Codecov ReportAttention: Patch coverage is
❌ Your patch status has failed because the patch coverage (79.78%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #11247 +/- ##
==========================================
+ Coverage 88.93% 88.96% +0.02%
==========================================
Files 187 189 +2
Lines 24072 24161 +89
==========================================
+ Hits 21408 21494 +86
- Misses 2664 2667 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
7b8775e
to
8bb42e2
Compare
…n for microbatch models Upon the initial implementation of microbatch models, the the `start` for a batch was _optional_. However, in c3d87b8 they became guaranteed. Thus the if statement guarding when `start/end` isn't present for microbatch models was no longer actually doing anything. Hence, the if statement was safe to remove.
This is temporary as a POC. In the end, sample mode can't depend on the arguments `--event-time-start/end` and will need to be split into their own CLI args / project config, something like `--sample-window`. The issue with using `--event-time-start/end` is that if people set those in the project configs, then their microbatch models would _always_ run with those values even outside of sample mode. Despite that, this is a useful checkpoint even though it will go away.
…e-start/end` Using `--event-time-start/end` for sample mode was conflicting with microbatch models when _not_ running in sample mode. We will have to do _slightly_ more work to plumb this new way of specifying sample time to microbatch models.
…dule This is mostly symbolic. We are going to be adding some utilities for "event_time" type things, which will all live in the `event_time` submodule. Additionally we plan to refactor `/incremental/materializations/microbatch.py` into the sub module as well.
The `MicrobatchBuilder.offset_timestamp` _truncates_ the timestamp before offsetting it. We don't want to do that, we want to offset the "raw" timestamp. We could have split renamed the microbatch builder function name to `truncate_and_offset_timestamp` and separated the offset logic into a separate abstract function. However, the offset logic in the MicrobatchBuilder context depends on the truncation. We might later on be able to refactor the Microbatch provided function by instead truncating _after_ offsetting instead of before. But that is out of scope for this initial work, and we should instead revisit it later.
The previous commit began using a submodule of the dateutil builtin python library. We weren't previously using this library, and thus didn't need the type stubs for it. But now that we do use it, we need to have the type stubs during development.
In most cases people will want to set "relative" sample windows, i.e. "3 days" to sample the last three days. However, there are some cases where people will want to "specific" sample windows for some chunk of historic time, i.e. `{'start': '2024-01-01', 'end': '2024-01-31'}`.
This was necessary because these aren't _always_ available. I had expected to need to do this after putting the `sample` flag behind an environment variable (which I haven't done yet). However, we needed to add the guards sooner because the `render` logic is called multiple times throughout the dbt process, and earlier on the flags aren't available.
…ODE` At this point sample mode is _alpha_ and should not be depended upon. To make this crystal clear we've gated the functionality behind an environment variable. We'll likely remove this gate in the coming month.
8bb42e2
to
1811754
Compare
9678427
to
01ae64f
Compare
I had updated the `later_input_model.sql` to be easier to test with. However, I didn't correspondingly update the inital `input_model.sql` to match.
…env var isn't present Previously microbatch was creating the _right_ number of batches when: 1. sample mode _wasn't_ being used 2. sample mode _was_ being used AND the env var was present Unfortunately sample mode _wasn't_ creating the right number of batches when: 3. sample mode _was_ being used AND the env var _wasn't_ present. In case (3) sample mode shouldn't be run. Unfortunately we weren't gating sample mode by the environment variable during batch creation. This lead to a situtation where in creating batches it was using sample mode but in the rendering of refs it _wasn't_ using sample mode. Putting it in an inbetween state... This commit fixes that issue. Additionally of note, we currently have duplicate sample mode gating logic in the batch creation as well as in the rendering of refs. We should probably consolidate this logic into a singular importable function, that way any future changes of how sample mode is gated is easier to implement.
envvar="DBT_SAMPLE", | ||
help="Run in sample mode, creating only samples of models where possible", | ||
default=False, | ||
is_flag=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hide, and later unhide. I.e. hidden=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarification, we want to hide these while we're in alpha / gated behind an environment variable. Once we remove the environment variable, we'll unhide these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for an alpha! The gating looks great, but lets add hidden
on the CLI options for good measure for now.
Test coverage looks super solid, especially around incremental + microbatch sample handling.
Artifact change test failure looks unrelated, to do with formatting in that/this repo:
diff --git a/dbt/run-results/v6.json b/dbt/run-results/v6.json
index 76eee1f..53b6491 100644
--- a/dbt/run-results/v6.json
+++ b/dbt/run-results/v6.json
@@ -262,4 +262,4 @@
"elapsed_time"
],
"$id": "https://schemas.getdbt.com/dbt/run-results/v6.json"
-}
+}
\ No newline at end of file
We are doing this _temporarily_ while sample mode as a feature is in alpha/beta and locked behind an environment variable. When we remove the environment variable we should also unhide these.
Resolves #11227
Resolves #11230
Resolves #11231
Resolves #11248
Resolves #11252
Resolves #11254
Resolves #11258
Problem
Solution
Checklist