Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample Mode Alpha #11247

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

Sample Mode Alpha #11247

wants to merge 11 commits into from

Conversation

QMalcolm
Copy link
Contributor

@QMalcolm QMalcolm commented Jan 28, 2025

Resolves #11230
Resolves #11231
Resolves #11248
Resolves #11252
Resolves #11258

Problem

Solution

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

@QMalcolm QMalcolm added the Skip Changelog Skips GHA to check for changelog file label Jan 28, 2025
@cla-bot cla-bot bot added the cla:yes label Jan 28, 2025
Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 87.09677% with 12 lines in your changes missing coverage. Please review.

Project coverage is 62.52%. Comparing base (c042370) to head (8bb42e2).

❗ There is a different number of reports uploaded between BASE (c042370) and HEAD (8bb42e2). Click for more details.

HEAD has 17 uploads less than BASE
Flag BASE (c042370) HEAD (8bb42e2)
unit 3 1
integration 15 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main   #11247       +/-   ##
===========================================
- Coverage   88.87%   62.52%   -26.36%     
===========================================
  Files         187      189        +2     
  Lines       24072    24160       +88     
===========================================
- Hits        21395    15107     -6288     
- Misses       2677     9053     +6376     
Flag Coverage Δ
integration ?
unit 62.52% <87.09%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 62.52% <87.09%> (+0.08%) ⬆️
Integration Tests 62.52% <87.09%> (-26.36%) ⬇️

@QMalcolm QMalcolm added the artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking label Jan 28, 2025
@QMalcolm QMalcolm force-pushed the qmalcolm--sample-mode-alpha branch 3 times, most recently from 7b8775e to 8bb42e2 Compare January 29, 2025 18:23
…n for microbatch models

Upon the initial implementation of microbatch models, the the `start` for a batch was _optional_.
However, in c3d87b8 they became guaranteed. Thus the if statement
guarding when `start/end` isn't present for microbatch models was no longer actually doing anything.
Hence, the if statement was safe to remove.
This is temporary as a POC. In the end, sample mode can't depend on the arguments
`--event-time-start/end` and will need to be split into their own CLI args / project
config, something like `--sample-window`. The issue with using `--event-time-start/end`
is that if people set those in the project configs, then their microbatch models would
_always_ run with those values even outside of sample mode. Despite that, this is a
useful checkpoint even though it will go away.
…e-start/end`

Using `--event-time-start/end` for sample mode was conflicting with microbatch models
when _not_ running in sample mode. We will have to do _slightly_ more work to plumb
this new way of specifying sample time to microbatch models.
…dule

This is mostly symbolic. We are going to be adding some utilities for "event_time"
type things, which will all live in the `event_time` submodule. Additionally we plan
to refactor `/incremental/materializations/microbatch.py` into the sub module as well.
The `MicrobatchBuilder.offset_timestamp` _truncates_ the timestamp before
offsetting it. We don't want to do that, we want to offset the "raw" timestamp.
We could have split renamed the microbatch builder function name to
`truncate_and_offset_timestamp` and separated the offset logic into a separate
abstract function. However, the offset logic in the MicrobatchBuilder context
depends on the truncation. We might later on be able to refactor the Microbatch
provided function by instead truncating _after_ offsetting instead of before.
But that is out of scope for this initial work, and we should instead revisit it
later.
The previous commit began using a submodule of the dateutil builtin
python library. We weren't previously using this library, and thus didn't
need the type stubs for it. But now that we do use it, we need to have
the type stubs during development.
In most cases people will want to set "relative" sample windows, i.e.
"3 days" to sample the last three days. However, there are some cases
where people will want to "specific" sample windows for some chunk of
historic time, i.e. `{'start': '2024-01-01', 'end': '2024-01-31'}`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking cla:yes Skip Changelog Skips GHA to check for changelog file
Projects
None yet
1 participant