Added support to pause and resume ingestion based on resource utilization #15008
+929
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Added support to pause and resume ingestion for OFFLINE and REALTIME tables when disk utilization exceeds a configured threshold.
Key Components Introduced
ResourceUtilizationChecker
– A controller periodic task that collects disk utilization of server instances.ResourceUtilizationInfo
– A container that holds disk usage info of Pinot instances./instance/diskUtilization
– New API that is exposed on the Pinot server instances.How it Works
The periodic task
ResourceUtilizationChecker
runs periodically and computes the disk usage info of the Pinot server instances. It captures the disk utilization info within the ResourceUtilizationInfo objects. For REALTIME tables, the periodic taskRealTimeSegmentValidationManager
pauses consumption on any given table if disk utilization is above the threshold. For OFFLINE tables, thePinotTaskManager
ensures that new ingestion tasks are not generated if disk utilization is above threshold.Notes on Implementation
– The periodic task
ResourceUtilizationChecker
runs by default when the controller starts. As of now, this runs as part of every controller.– For OFFLINE tables, new ingestion task generation is blocked when resource constraints are violated as opposed to pausing the task queue. This is because the task queue is shared across all tenants in the cluster.
Configs
"controller.enable.resource.utilization.check": "true"
--> Config used to enable/disable the behavior that pauses ingestion when resource constraints are violated.Testing
Tested that the periodic task runs without any errors.
Will perform additional tests and update the results.