Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support to pause and resume ingestion based on resource utilization #15008

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rajagopr
Copy link
Contributor

@rajagopr rajagopr commented Feb 7, 2025

Description

Added support to pause and resume ingestion for OFFLINE and REALTIME tables when disk utilization exceeds a configured threshold.

Key Components Introduced

ResourceUtilizationChecker – A controller periodic task that collects disk utilization of server instances.
ResourceUtilizationInfo – A container that holds disk usage info of Pinot instances.
/instance/diskUtilization – New API that is exposed on the Pinot server instances.

How it Works

The periodic task ResourceUtilizationChecker runs periodically and computes the disk usage info of the Pinot server instances. It captures the disk utilization info within the ResourceUtilizationInfo objects. For REALTIME tables, the periodic task RealTimeSegmentValidationManager pauses consumption on any given table if disk utilization is above the threshold. For OFFLINE tables, the PinotTaskManager ensures that new ingestion tasks are not generated if disk utilization is above threshold.

Notes on Implementation

– The periodic task ResourceUtilizationChecker runs by default when the controller starts. As of now, this runs as part of every controller.
– For OFFLINE tables, new ingestion task generation is blocked when resource constraints are violated as opposed to pausing the task queue. This is because the task queue is shared across all tenants in the cluster.

Configs

"controller.enable.resource.utilization.check": "true" --> Config used to enable/disable the behavior that pauses ingestion when resource constraints are violated.

Testing

Tested that the periodic task runs without any errors.

2025/02/07 02:23:01.425 INFO [ResourceUtilizationChecker] [pool-20-thread-1] Running periodic task: ResourceUtilizationChecker
2025/02/07 02:23:01.443 INFO [BasePeriodicTask] [pool-20-thread-1] [TaskRequestId: auto] Finish running task: ResourceUtilizationChecker in 18ms

Will perform additional tests and update the results.

@codecov-commenter
Copy link

codecov-commenter commented Feb 7, 2025

Codecov Report

Attention: Patch coverage is 0% with 13 lines in your changes missing coverage. Please review.

Project coverage is 56.21%. Comparing base (59551e4) to head (c5eb3a6).
Report is 1679 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/common/restlet/resources/DiskUsageInfo.java 0.00% 12 Missing ⚠️
.../org/apache/pinot/spi/config/table/PauseState.java 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (59551e4) and HEAD (c5eb3a6). Click for more details.

HEAD has 20 uploads less than BASE
Flag BASE (59551e4) HEAD (c5eb3a6)
integration 7 5
temurin 12 8
java-21 7 6
skip-bytebuffers-false 7 4
unittests 5 3
java-11 5 2
unittests2 3 0
integration1 2 1
custom-integration1 2 1
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15008      +/-   ##
============================================
- Coverage     61.75%   56.21%   -5.54%     
- Complexity      207      802     +595     
============================================
  Files          2436     2129     -307     
  Lines        133233   114661   -18572     
  Branches      20636    18455    -2181     
============================================
- Hits          82274    64458   -17816     
- Misses        44911    44941      +30     
+ Partials       6048     5262     -786     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 56.13% <0.00%> (-5.58%) ⬇️
java-21 56.07% <0.00%> (-5.56%) ⬇️
skip-bytebuffers-false 56.16% <0.00%> (-5.58%) ⬇️
skip-bytebuffers-true 56.04% <0.00%> (+28.31%) ⬆️
temurin 56.21% <0.00%> (-5.54%) ⬇️
unittests 56.21% <0.00%> (-5.54%) ⬇️
unittests1 56.21% <0.00%> (+9.32%) ⬆️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants