Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample Mode Alpha #11247

Merged
merged 20 commits into from
Feb 4, 2025
Merged

Sample Mode Alpha #11247

merged 20 commits into from
Feb 4, 2025

Conversation

QMalcolm
Copy link
Contributor

@QMalcolm QMalcolm commented Jan 28, 2025

Resolves #11227
Resolves #11230
Resolves #11231
Resolves #11248
Resolves #11252
Resolves #11254
Resolves #11258

Problem

Solution

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

@QMalcolm QMalcolm added the Skip Changelog Skips GHA to check for changelog file label Jan 28, 2025
@cla-bot cla-bot bot added the cla:yes label Jan 28, 2025
Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 96.80851% with 3 lines in your changes missing coverage. Please review.

Project coverage is 88.96%. Comparing base (fdabe95) to head (1ef8b3f).
Report is 1 commits behind head on main.

❌ Your patch status has failed because the patch coverage (79.78%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11247      +/-   ##
==========================================
+ Coverage   88.93%   88.96%   +0.02%     
==========================================
  Files         187      189       +2     
  Lines       24072    24161      +89     
==========================================
+ Hits        21408    21494      +86     
- Misses       2664     2667       +3     
Flag Coverage Δ
integration 86.30% <79.78%> (-0.01%) ⬇️
unit 62.53% <87.23%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 62.53% <87.23%> (+0.08%) ⬆️
Integration Tests 86.30% <79.78%> (-0.01%) ⬇️

@QMalcolm QMalcolm added the artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking label Jan 28, 2025
@QMalcolm QMalcolm force-pushed the qmalcolm--sample-mode-alpha branch 3 times, most recently from 7b8775e to 8bb42e2 Compare January 29, 2025 18:23
…n for microbatch models

Upon the initial implementation of microbatch models, the the `start` for a batch was _optional_.
However, in c3d87b8 they became guaranteed. Thus the if statement
guarding when `start/end` isn't present for microbatch models was no longer actually doing anything.
Hence, the if statement was safe to remove.
This is temporary as a POC. In the end, sample mode can't depend on the arguments
`--event-time-start/end` and will need to be split into their own CLI args / project
config, something like `--sample-window`. The issue with using `--event-time-start/end`
is that if people set those in the project configs, then their microbatch models would
_always_ run with those values even outside of sample mode. Despite that, this is a
useful checkpoint even though it will go away.
…e-start/end`

Using `--event-time-start/end` for sample mode was conflicting with microbatch models
when _not_ running in sample mode. We will have to do _slightly_ more work to plumb
this new way of specifying sample time to microbatch models.
…dule

This is mostly symbolic. We are going to be adding some utilities for "event_time"
type things, which will all live in the `event_time` submodule. Additionally we plan
to refactor `/incremental/materializations/microbatch.py` into the sub module as well.
The `MicrobatchBuilder.offset_timestamp` _truncates_ the timestamp before
offsetting it. We don't want to do that, we want to offset the "raw" timestamp.
We could have split renamed the microbatch builder function name to
`truncate_and_offset_timestamp` and separated the offset logic into a separate
abstract function. However, the offset logic in the MicrobatchBuilder context
depends on the truncation. We might later on be able to refactor the Microbatch
provided function by instead truncating _after_ offsetting instead of before.
But that is out of scope for this initial work, and we should instead revisit it
later.
The previous commit began using a submodule of the dateutil builtin
python library. We weren't previously using this library, and thus didn't
need the type stubs for it. But now that we do use it, we need to have
the type stubs during development.
In most cases people will want to set "relative" sample windows, i.e.
"3 days" to sample the last three days. However, there are some cases
where people will want to "specific" sample windows for some chunk of
historic time, i.e. `{'start': '2024-01-01', 'end': '2024-01-31'}`.
This was necessary because these aren't _always_ available. I had expected
to need to do this after putting the `sample` flag behind an environment
variable (which I haven't done yet). However, we needed to add the guards
sooner because the `render` logic is called multiple times throughout the
dbt process, and earlier on the flags aren't available.
…ODE`

At this point sample mode is _alpha_ and should not be depended upon. To make
this crystal clear we've gated the functionality behind an environment variable.
We'll likely remove this gate in the coming month.
@QMalcolm QMalcolm force-pushed the qmalcolm--sample-mode-alpha branch from 8bb42e2 to 1811754 Compare January 31, 2025 19:57
@QMalcolm QMalcolm force-pushed the qmalcolm--sample-mode-alpha branch from 9678427 to 01ae64f Compare February 2, 2025 20:01
@QMalcolm QMalcolm marked this pull request as ready for review February 2, 2025 20:01
@QMalcolm QMalcolm requested a review from a team as a code owner February 2, 2025 20:01
@QMalcolm QMalcolm removed the Skip Changelog Skips GHA to check for changelog file label Feb 2, 2025
I had updated the `later_input_model.sql` to be easier to test with. However,
I didn't correspondingly update the inital `input_model.sql` to match.
…env var isn't present

Previously microbatch was creating the _right_ number of batches when:
1. sample mode _wasn't_ being used
2. sample mode _was_ being used AND the env var was present

Unfortunately sample mode _wasn't_ creating the right number of batches when:
3. sample mode _was_ being used AND the env var _wasn't_ present.

In case (3) sample mode shouldn't be run. Unfortunately we weren't gating sample
mode by the environment variable during batch creation. This lead to a situtation
where in creating batches it was using sample mode but in the rendering of refs
it _wasn't_ using sample mode. Putting it in an inbetween state... This commit
fixes that issue.

Additionally of note, we currently have duplicate sample mode gating logic in the
batch creation as well as in the rendering of refs. We should probably consolidate
this logic into a singular importable function, that way any future changes of how
sample mode is gated is easier to implement.
envvar="DBT_SAMPLE",
help="Run in sample mode, creating only samples of models where possible",
default=False,
is_flag=True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide, and later unhide. I.e. hidden=True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarification, we want to hide these while we're in alpha / gated behind an environment variable. Once we remove the environment variable, we'll unhide these

Copy link
Contributor

@MichelleArk MichelleArk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for an alpha! The gating looks great, but lets add hidden on the CLI options for good measure for now.

Test coverage looks super solid, especially around incremental + microbatch sample handling.

Artifact change test failure looks unrelated, to do with formatting in that/this repo:

diff --git a/dbt/run-results/v6.json b/dbt/run-results/v6.json
index 76eee1f..53b6491 100644
--- a/dbt/run-results/v6.json
+++ b/dbt/run-results/v6.json
@@ -262,4 +262,4 @@
     "elapsed_time"
   ],
   "$id": "https://schemas.getdbt.com/dbt/run-results/v6.json"
-}
+}
\ No newline at end of file

We are doing this _temporarily_ while sample mode as a feature is in
alpha/beta and locked behind an environment variable. When we remove the
environment variable we should also unhide these.
@QMalcolm QMalcolm merged commit 5f873da into main Feb 4, 2025
54 of 56 checks passed
@QMalcolm QMalcolm deleted the qmalcolm--sample-mode-alpha branch February 4, 2025 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment