[Bug/Enhancement] Microbatch models shouldn't block the main thread in multi-threaded dbt runs. #11243
Open
1 task done
Labels
backport 1.9.latest
bug
Something isn't working
microbatch
Issues related to the microbatch incremental strategy
Housekeeping
Short description
Microbatch models currently block the main thread when running dbt with multiple threads. This affects microbatch models that have their batches run concurrently or sequentially, but has greater impacts when the batches are being run sequentially. In essence, the issue is that the scheduling of batch execution for a microbatch model happens on the main thread. This means that the scheduling of other models becomes blocked until the microbatch model is complete.
For example, consider we have a microbatch model in our project. The microbatch model has many sibling/cousin/nibling/etc nodes. Our dbt project is configured to have multiple threads. Once the main thread reaches the microbatch model, work already being done on worker threads for other nodes will continue. However, no new worker threads for other nodes will be spun up as this scheduling is handled by the main thread, which is now blocked by the microbatch model. When the batches for microbatch models are run sequentially, this means all the batches are being run on the main thread, basically reducing the dbt invocation to being single threaded until the completion of the microbatch model. When batches for microbatch models are run concurrently, worker threads will be spun up for the batches. However, the main thread is still blocked by the microbatch model until all the batches have completed. In this case, the worker threads will be saturated for the most part. However, if there is one long-running batch where all the other work thread batches have finished, the remaining threads will sit idle until the last batch finishes for the microbatch model.
We should instead delegate the scheduling of microbatch batches to a worker thread itself. This would mean that when batches are being run sequentially, only one worker thread would be occupied, and would not block the main thread. When batches are being run concurrently, one worker thread will do the scheduling and saturate the other worker threads with batches. This does mean that one worker thread won't be doing batch work, which isn't great. However, if there are some long-running batches that are holding only a subset of the worker threads, then main thread can continue to allocate other nodes to the freed up worker nodes.
Acceptance criteria
Suggested Tests
Impact to Other Teams
N/A
Will backports be required?
Possibly 1.9, although this might be a large enough change that it isn't safe to do so.
Context
No response
The text was updated successfully, but these errors were encountered: