-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pauseless Consumption #3: Disaster Recovery with Reingestion #14920
base: master
Are you sure you want to change the base?
Conversation
1. Changing FSM 2. Changing the 3 steps performed during the commit protocol to update ZK and Ideal state
1. Changes in the commit protocol to start segment commit before the build 2. Changes in the BaseTableDataManager to ensure that the locally built segment is replaced by a downloaded one only when the CRC is present in the ZK Metadata 3. Changes in the download segment method to allow waited download in case of pauseless consumption
…segment commit end metadata call Refactoing code for redability
… ingestion by moving it out of streamConfigMap
…auseless ingestion in RealtimeSegmentValidationManager
…d by RealtimeSegmentValitdationManager to fix commit protocol failures
…g commit protocol
…ption is enabled or not
…eepstore path with fallbacks
So this check is only performed when RealtimeSegmentValidationManager job runs right? If yes then should this logic be part of separate dedicated job with higher frequency? |
@@ -2195,6 +2198,139 @@ URI createSegmentPath(String rawTableName, String segmentName) { | |||
return URIUtils.getUri(_controllerConf.getDataDir(), rawTableName, URIUtils.encode(segmentName)); | |||
} | |||
|
|||
/** | |||
* Re-ingests segments that are in DONE status with a missing download URL, but also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be:
Re-ingests segments that are in
ONLINE status with a missing download URL, but also
LOGGER.info( | ||
"Segment {} in table {} is in ERROR state with download URL present. Resetting segment to ONLINE state.", | ||
segmentName, tableNameWithType); | ||
_helixResourceManager.resetSegment(tableNameWithType, segmentName, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset segment does not work when the SegmentDataManager is missing on the server. Consider the following scenario:
A segment has missing url. The server hosting these segments restart and the segment goes in ERROR state in EV.
The re-ingestion updates the ZKMetadata and the reset segment message is sent.
The server does not have any SegmentDataManager instance for the segment and hence the reset does not work.
protected void doReplaceSegment(String segmentName)
throws Exception {
SegmentDataManager segmentDataManager = _segmentDataManagerMap.get(segmentName);
if (segmentDataManager != null) {
SegmentZKMetadata zkMetadata = fetchZKMetadata(segmentName);
IndexLoadingConfig indexLoadingConfig = fetchIndexLoadingConfig();
indexLoadingConfig.setSegmentTier(zkMetadata.getTier());
replaceSegmentIfCrcMismatch(segmentDataManager, zkMetadata, indexLoadingConfig);
} else {
_logger.warn("Failed to find segment: {}, skipping replacing it", segmentName);
}
}
A ran the above code and found the following error.
[upsertMeetupRsvp_with_dr_2_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_40] Failed to find segment: upsertMeetupRsvp_with_dr_2__0__57__20250127T0745Z, skipping replacing it
45f6f29
to
609942d
Compare
…can only be updated after a fixed time has elapsed Reduced the time requirements by creating a FakePauselessLLCRealtimeSegmentManager.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14920 +/- ##
============================================
+ Coverage 61.75% 63.36% +1.61%
- Complexity 207 1376 +1169
============================================
Files 2436 2714 +278
Lines 133233 152474 +19241
Branches 20636 23521 +2885
============================================
+ Hits 82274 96615 +14341
- Misses 44911 48584 +3673
- Partials 6048 7275 +1227
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review
// Grab start/end offsets | ||
String startOffsetStr = segmentZKMetadata.getStartOffset(); | ||
String endOffsetStr = segmentZKMetadata.getEndOffset(); | ||
if (startOffsetStr == null || endOffsetStr == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) maybe let's breakup all these similar validations into separate method (will be useful for unit testing as well).
} catch (Exception e) { | ||
LOGGER.error("Error during async re-ingestion for job {} (segment={})", jobId, segmentName, e); | ||
} finally { | ||
isIngesting.set(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't there be race condition here when same segments is again scheduled to be re-ingested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because it will get returned when trying to set this to true right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about at just the instant just after flag is set false. (Just looking from as a server API perspective ignoring however frequently this API is called)
pinot-server/src/main/java/org/apache/pinot/server/api/resources/ReIngestionResource.java
Outdated
Show resolved
Hide resolved
String tableNameWithType = request.getTableNameWithType(); | ||
String segmentName = request.getSegmentName(); | ||
|
||
if (RUNNING_JOBS.size() >= MAX_PARALLEL_REINGESTIONS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These states are in-memory hence curious about the case when controllers sends re-ingest request to another server for same segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Currently it will get executed. However that scenario is unlikely as reingestion runs extremely infrequently (once per few hours)
One possible solution could be ZK
Another would be to keep track on controller side instead of server side.
Will check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reingestion runs extremely infrequently (once per few hours)
Isn't this a concern as well? Users might not be ok with data missing for a long time
/** | ||
* Simplified Segment Data Manager for ingesting data from a start offset to an end offset. | ||
*/ | ||
public class SimpleRealtimeSegmentDataManager extends SegmentDataManager { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this create code debt in future maintaining both SimpleRealtimeSegmentDataManager and RealtimeSegmentDataManager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should leverage RealtimeSegmentDataManager methods in this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately that is not possible as these methods diverge a lot from main ones
This PR adds support for Disaster recovery for Pauseless Ingestion along with Reingestion. These changes help solve the scenario where real-time segments permanently fail to transition out of ERROR state, leading to data gaps. With reingestion, Pinot can recover such segments, ensuring availability and correctness of real-time data.
During Pauseless ingestion, a ONLINE segment can wind up in an ERROR state if its commit fails due to server restart and there are no other replicas. Currently in pinot, there is no way to recover from such failures.
Reingestion Flow
Segments that fail to commit or end up in ERROR state can now be re-ingested by calling a new endpoint (
/reingestSegment
) on the server.The ReIngestionResource reconstructs the segment from the stream, builds it, and commits it, ensuring that offline peers and the deep store get updated properly.
If successful, the re-ingested segment transitions from ERROR to ONLINE.
New APIs introduced:
Get Running Re-ingestion Jobs
GET /reingestSegment/jobs
Returns all currently running re-ingestion jobs with their status information.
Response
Responses
Re-ingest Segment
POST /reingestSegment
Asynchronously re-ingests a segment with updated configurations.
Request Body
Response
Responses
Reingestion data flow
Reingestion design diagram