[Feature] allow gaps in the lookback range for microbatch #11242

data-blade · 2025-01-27T14:13:09Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

currently lookback accepts an integer, representing the [today-n : today] range of the incremental run

in most companies the distribution of delayed data is very skewed towards the newer end of the lookback range [citation needed].

i.e. 90+% of delayed data arrives after 1 day, and then comes the long tail.

to improve efficiency, implement a lookback that accepts [0, 1, n], where n is the greatest possible delay. when running regularly, this would not immediately update the data in [1 < x < n], saving significant compute by skipping. instead, the data would be fully updated after n days, in a rolling fashion.

Describe alternatives you've considered

we implemented our own version of this a while ago, with a date range macro that accept both an integer (range without gaps) or an array of integers (range with gaps, or just specific days).

simplified:

{# reprocessing specific days -#}
{% if lookback is sequence -%}
	({% for day in lookback %}
		{{ event_date }} between
			current_date - {{ day }}
			and current_date - {{ day }}
		{{ 'or' if not loop.last -}}
	{% endfor -%}
	)
{# reprocessing last x days -#}
{% else -%}
	{{ event_date }} between
		current_date - {{ lookback }}
		and current_date
{% endif -%}

Who will this benefit?

any clients with...

large datasets, i.e. computation is a significant cost factor
who "want all data"
delayed data has a typical recency skew

Are you interested in contributing this feature?

yes, if it's as easy as our macro ;)

Anything else?

for perspective, this is currently a blocker for us for implementing microbatches. the advantage of calculating daily batches is completely offset by not being able to skip "plot-irrelevant" days.

The text was updated successfully, but these errors were encountered:

data-blade added enhancement New feature or request triage labels Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] allow gaps in the lookback range for microbatch #11242

[Feature] allow gaps in the lookback range for microbatch #11242

data-blade commented Jan 27, 2025

[Feature] allow gaps in the lookback range for microbatch #11242

[Feature] allow gaps in the lookback range for microbatch #11242

Comments

data-blade commented Jan 27, 2025

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?