Skip to content

Conversation

@evan-wall-mysten
Copy link
Collaborator

@evan-wall-mysten evan-wall-mysten commented Dec 4, 2025

Description

Without the change in this PR, the indexer logs these errors when started using a non-zero --first-checkpoint for the first time (no watermark record exists):

2025-12-03T20:23:45.594924Z ERROR sui_indexer_alt_framework::pipeline::concurrent::pruner: Failed to prune data for range: 121926000 to 121928000: No checkpoint mapping found for checkpoint 121926000 pipeline="kv_epoch_starts"

This error is caused by the pruner attempting to prune data that was never indexed because pruner_hi is initialized to 0.

This PR sets checkpoint_hi_inclusive, reader_lo, pruner_hi based on the default_checkpoint which is either the value from --first-checkpoint (defaulting to 0 if not set). When the pruner runs, it uses this value of pruner_hi to avoid trying to prune data that was never indexed.

Test plan

  1. Added unit tests.
  2. Tested with sui-indexer-alt-benchmark:
    a. Deploy sui-indexer-alt-benchmark with pulumi up (this line causes the benchmark to use the branch from this PR https://github.com/MystenLabs/sui-operations/blob/c3011e3ae777c8b58019c4a83211aefa54f8d372/pulumi/gcp/sui-indexer-alt-benchmark/Pulumi.dev.yaml#L8)
    b. Verify error is not in logs
    kubectl logs sui-indexer-alt-benchmark-indexer-89f9c6965-s6hx9 -n sui-indexer-alt-benchmark`
    

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • gRPC:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • Indexing Framework: Fix pruning for concurrent pipelines when indexer is initialized with --first-checkpoint.

@vercel
Copy link

vercel bot commented Dec 4, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
sui-docs Ready Ready Preview Comment Dec 10, 2025 7:44pm
2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
multisig-toolkit Ignored Ignored Preview Dec 10, 2025 7:44pm
sui-kiosk Ignored Ignored Preview Dec 10, 2025 7:44pm

Comment on lines 28 to 39
first_checkpoint: u64,
) -> anyhow::Result<u64> {
// Create a StoredWatermark directly from CommitterWatermark
let stored_watermark = StoredWatermark {
pipeline: pipeline_task.to_string(),
epoch_hi_inclusive: 0,
checkpoint_hi_inclusive: first_checkpoint as i64 - 1,
tx_hi: 0,
timestamp_ms_hi_inclusive: 0,
reader_lo: first_checkpoint as i64,
pruner_timestamp: NaiveDateTime::UNIX_EPOCH,
pruner_hi: first_checkpoint as i64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think as a trait method we should have init_watermark accept three checkpoint values for checkpoint_hi_inclusive, reader_lo, pruner_hi, or have the doc comments explain that init_watermark is supposed to set these values a particular way

Also, I think we want pruner_hi <= reader_lo < checkpoint_hi_inclusive, which should be safe if we handle when checkpoint_hi_inclusive is 0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't checkpoint_hi_inclusive = 0 mean that checkpoint 0 was indexed which is not true initially?

@evan-wall-mysten evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from e69ea47 to a44ecd9 Compare December 9, 2025 19:15
@evan-wall-mysten evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 9, 2025 19:15 — with GitHub Actions Inactive
@evan-wall-mysten evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from a44ecd9 to 5e8baab Compare December 10, 2025 17:58
@evan-wall-mysten evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 10, 2025 17:58 — with GitHub Actions Inactive
@evan-wall-mysten evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from 5e8baab to b42f096 Compare December 10, 2025 17:58
@evan-wall-mysten evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 10, 2025 17:58 — with GitHub Actions Inactive
@evan-wall-mysten evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from b42f096 to 18274e0 Compare December 10, 2025 17:59
@evan-wall-mysten evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 10, 2025 17:59 — with GitHub Actions Inactive
@evan-wall-mysten evan-wall-mysten marked this pull request as ready for review December 10, 2025 18:01
@evan-wall-mysten evan-wall-mysten requested a review from a team as a code owner December 10, 2025 18:01
@evan-wall-mysten evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 10, 2025 18:01 — with GitHub Actions Inactive
async fn add_pipeline<P: Processor + 'static>(&mut self) -> Result<Option<u64>> {
async fn add_pipeline<P: Processor + 'static>(
&mut self,
init_watermark: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose alternatively, we don't need to add this function parameter, and just check self.task.is_none() in add_pipeline right? If it's a tasked indexer, we don't need to seed un-watermarked pipelines, because they're tasked pipelines and don't have the ability to prune

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially added this parameter to differentiate between concurrent pipelines (need watermark) and sequential pipelines (do not need watermark) pipelines, but needed modify it to also exclude concurrent tasked pipelines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see. I think it doesn't really matter what we seed a sequential pipeline's reader_lo and pruner_hi to, which is why I was suggesting that we could consolidate the logic into add_pipeline, and just check for self.task.is_none() && self.default_next_checkpoint > 0 there. That would mean for sequential pipelines, indexer starting at a non-genesis checkpoint would seed it with the same watermark a concurrent pipeline would get if no watermark entry exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, then again, from another perspective, this is only needed if a pipeline is capable of pruning. And that is a concurrent pipeline running in a main indexer..

@evan-wall-mysten evan-wall-mysten changed the title [indexer-alt-framework] Set reader_lo, pruner_hi if watermark does not exist [indexer-alt-framework] Set checkpoint_hi_inclusive, reader_lo, pruner_hi if watermark does not exist Dec 10, 2025
@evan-wall-mysten evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from 18274e0 to 7fc131b Compare December 10, 2025 19:27
@evan-wall-mysten evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 10, 2025 19:27 — with GitHub Actions Inactive
Copy link
Contributor

@wlmyng wlmyng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock

Wanted to circle back/ raise the discussion on add_pipeline but I don't feel too strongly about it

And had some comments on the tests that would be nice to address

async fn add_pipeline<P: Processor + 'static>(&mut self) -> Result<Option<u64>> {
async fn add_pipeline<P: Processor + 'static>(
&mut self,
init_watermark: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see. I think it doesn't really matter what we seed a sequential pipeline's reader_lo and pruner_hi to, which is why I was suggesting that we could consolidate the logic into add_pipeline, and just check for self.task.is_none() && self.default_next_checkpoint > 0 there. That would mean for sequential pipelines, indexer starting at a non-genesis checkpoint would seed it with the same watermark a concurrent pipeline would get if no watermark entry exists.

.with_context(|| format!("Failed to get watermark for {pipeline_task}"))?;
let watermark = if init_watermark {
let init_watermark = InitWatermark {
checkpoint_hi_inclusive: self.default_next_checkpoint as i64 - 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe can add a // SAFETY: checked > 0 earlier

async fn add_pipeline<P: Processor + 'static>(&mut self) -> Result<Option<u64>> {
async fn add_pipeline<P: Processor + 'static>(
&mut self,
init_watermark: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, then again, from another perspective, this is only needed if a pipeline is capable of pruning. And that is a concurrent pipeline running in a main indexer..

..Default::default()
})
.await;
assert_eq!(committer_watermark, None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment like, "indexer will not seed the watermark, pipeline tasks will write commit watermarks as normal" would be nice

#[tokio::test]
async fn test_init_watermark_concurrent_pipeline_first_checkpoint_1() {
let (committer_watermark, pruner_watermark) =
test_init_watermark(InitCheckpointArgs::default()).await;
Copy link
Contributor

@wlmyng wlmyng Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default to me reads = 0

I think for tests we should explicitly be like, "InitCheckpointArgs::init(heckpoint 1)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants