[indexer-alt-framework] Set checkpoint_hi_inclusive, reader_lo, pruner_hi if watermark does not exist #24523

evan-wall-mysten · 2025-12-04T20:18:22Z

Description

Without the change in this PR, the indexer logs these errors when started using a non-zero --first-checkpoint for the first time (no watermark record exists):

2025-12-03T20:23:45.594924Z ERROR sui_indexer_alt_framework::pipeline::concurrent::pruner: Failed to prune data for range: 121926000 to 121928000: No checkpoint mapping found for checkpoint 121926000 pipeline="kv_epoch_starts"

This error is caused by the pruner attempting to prune data that was never indexed because pruner_hi is initialized to 0.

This PR sets checkpoint_hi_inclusive, reader_lo, pruner_hi based on the default_checkpoint which is either the value from --first-checkpoint (defaulting to 0 if not set). When the pruner runs, it uses this value of pruner_hi to avoid trying to prune data that was never indexed.

Test plan

Added unit tests.
Tested with sui-indexer-alt-benchmark:
a. Deploy sui-indexer-alt-benchmark with pulumi up (this line causes the benchmark to use the branch from this PR https://github.com/MystenLabs/sui-operations/blob/c3011e3ae777c8b58019c4a83211aefa54f8d372/pulumi/gcp/sui-indexer-alt-benchmark/Pulumi.dev.yaml#L8)
b. Verify error is not in logs
```
kubectl logs sui-indexer-alt-benchmark-indexer-89f9c6965-s6hx9 -n sui-indexer-alt-benchmark`
```

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

vercel · 2025-12-04T20:18:27Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
sui-docs	Ready	Preview	Comment	Dec 10, 2025 7:44pm

2 Skipped Deployments

Project	Deployment	Preview	Comments	Updated (UTC)
multisig-toolkit	Ignored	Preview		Dec 10, 2025 7:44pm
sui-kiosk	Ignored	Preview		Dec 10, 2025 7:44pm

crates/sui-pg-db/src/store.rs

crates/sui-indexer-alt-framework/src/mocks/store.rs

wlmyng · 2025-12-08T17:22:29Z

crates/sui-pg-db/src/store.rs

+        first_checkpoint: u64,
+    ) -> anyhow::Result<u64> {
+        // Create a StoredWatermark directly from CommitterWatermark
+        let stored_watermark = StoredWatermark {
+            pipeline: pipeline_task.to_string(),
+            epoch_hi_inclusive: 0,
+            checkpoint_hi_inclusive: first_checkpoint as i64 - 1,
+            tx_hi: 0,
+            timestamp_ms_hi_inclusive: 0,
+            reader_lo: first_checkpoint as i64,
+            pruner_timestamp: NaiveDateTime::UNIX_EPOCH,
+            pruner_hi: first_checkpoint as i64,


i think as a trait method we should have init_watermark accept three checkpoint values for checkpoint_hi_inclusive, reader_lo, pruner_hi, or have the doc comments explain that init_watermark is supposed to set these values a particular way

Also, I think we want pruner_hi <= reader_lo < checkpoint_hi_inclusive, which should be safe if we handle when checkpoint_hi_inclusive is 0

Doesn't checkpoint_hi_inclusive = 0 mean that checkpoint 0 was indexed which is not true initially?

crates/sui-indexer-alt-framework-store-traits/src/lib.rs

wlmyng · 2025-12-10T18:35:05Z

crates/sui-indexer-alt-framework/src/lib.rs

-    async fn add_pipeline<P: Processor + 'static>(&mut self) -> Result<Option<u64>> {
+    async fn add_pipeline<P: Processor + 'static>(
+        &mut self,
+        init_watermark: bool,


I suppose alternatively, we don't need to add this function parameter, and just check self.task.is_none() in add_pipeline right? If it's a tasked indexer, we don't need to seed un-watermarked pipelines, because they're tasked pipelines and don't have the ability to prune

I initially added this parameter to differentiate between concurrent pipelines (need watermark) and sequential pipelines (do not need watermark) pipelines, but needed modify it to also exclude concurrent tasked pipelines.

Ok I see. I think it doesn't really matter what we seed a sequential pipeline's reader_lo and pruner_hi to, which is why I was suggesting that we could consolidate the logic into add_pipeline, and just check for self.task.is_none() && self.default_next_checkpoint > 0 there. That would mean for sequential pipelines, indexer starting at a non-genesis checkpoint would seed it with the same watermark a concurrent pipeline would get if no watermark entry exists.

Hmm, then again, from another perspective, this is only needed if a pipeline is capable of pruning. And that is a concurrent pipeline running in a main indexer..

crates/sui-pg-db/src/store.rs

crates/sui-indexer-alt-framework/src/lib.rs

wlmyng

Approving to unblock

Wanted to circle back/ raise the discussion on add_pipeline but I don't feel too strongly about it

And had some comments on the tests that would be nice to address

wlmyng · 2025-12-10T20:25:02Z

crates/sui-indexer-alt-framework/src/lib.rs

-    async fn add_pipeline<P: Processor + 'static>(&mut self) -> Result<Option<u64>> {
+    async fn add_pipeline<P: Processor + 'static>(
+        &mut self,
+        init_watermark: bool,


Ok I see. I think it doesn't really matter what we seed a sequential pipeline's reader_lo and pruner_hi to, which is why I was suggesting that we could consolidate the logic into add_pipeline, and just check for self.task.is_none() && self.default_next_checkpoint > 0 there. That would mean for sequential pipelines, indexer starting at a non-genesis checkpoint would seed it with the same watermark a concurrent pipeline would get if no watermark entry exists.

wlmyng · 2025-12-10T20:25:24Z

crates/sui-indexer-alt-framework/src/lib.rs

-            .with_context(|| format!("Failed to get watermark for {pipeline_task}"))?;
+        let watermark = if init_watermark {
+            let init_watermark = InitWatermark {
+                checkpoint_hi_inclusive: self.default_next_checkpoint as i64 - 1,


maybe can add a // SAFETY: checked > 0 earlier

wlmyng · 2025-12-10T20:28:51Z

crates/sui-indexer-alt-framework/src/lib.rs

-    async fn add_pipeline<P: Processor + 'static>(&mut self) -> Result<Option<u64>> {
+    async fn add_pipeline<P: Processor + 'static>(
+        &mut self,
+        init_watermark: bool,


Hmm, then again, from another perspective, this is only needed if a pipeline is capable of pruning. And that is a concurrent pipeline running in a main indexer..

wlmyng · 2025-12-10T20:29:21Z

crates/sui-indexer-alt-framework/src/lib.rs

+            ..Default::default()
+        })
+        .await;
+        assert_eq!(committer_watermark, None);


I think a comment like, "indexer will not seed the watermark, pipeline tasks will write commit watermarks as normal" would be nice

wlmyng · 2025-12-10T20:30:05Z

crates/sui-indexer-alt-framework/src/lib.rs

+    #[tokio::test]
+    async fn test_init_watermark_concurrent_pipeline_first_checkpoint_1() {
+        let (committer_watermark, pruner_watermark) =
+            test_init_watermark(InitCheckpointArgs::default()).await;


default to me reads = 0

I think for tests we should explicitly be like, "InitCheckpointArgs::init(heckpoint 1)"

evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 4, 2025 20:18 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs December 4, 2025 20:19 View deployment

evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from 26aee1c to 6f10779 Compare December 7, 2025 19:05

evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 7, 2025 19:05 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs December 7, 2025 19:08 View deployment

evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from 6f10779 to e69ea47 Compare December 7, 2025 21:33

evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 7, 2025 21:33 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs December 7, 2025 21:35 View deployment

evan-wall-mysten commented Dec 8, 2025

View reviewed changes

crates/sui-pg-db/src/store.rs Show resolved Hide resolved

wlmyng reviewed Dec 8, 2025

View reviewed changes

evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from e69ea47 to a44ecd9 Compare December 9, 2025 19:15

evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 9, 2025 19:15 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs December 9, 2025 19:17 View deployment

evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from a44ecd9 to 5e8baab Compare December 10, 2025 17:58

evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 10, 2025 17:58 — with GitHub Actions Inactive

evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from 5e8baab to b42f096 Compare December 10, 2025 17:58

evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 10, 2025 17:58 — with GitHub Actions Inactive

evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from b42f096 to 18274e0 Compare December 10, 2025 17:59

evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 10, 2025 17:59 — with GitHub Actions Inactive

evan-wall-mysten marked this pull request as ready for review December 10, 2025 18:01

evan-wall-mysten requested a review from a team as a code owner December 10, 2025 18:01

evan-wall-mysten requested review from amnn, emmazzz, henryachen, nickvikeras, tpham-mysten and wlmyng December 10, 2025 18:01

evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 10, 2025 18:01 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs December 10, 2025 18:04 View deployment

wlmyng reviewed Dec 10, 2025

View reviewed changes

evan-wall-mysten changed the title ~~[indexer-alt-framework] Set reader_lo, pruner_hi if watermark does not exist~~ [indexer-alt-framework] Set checkpoint_hi_inclusive, reader_lo, pruner_hi if watermark does not exist Dec 10, 2025

evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from 18274e0 to 7fc131b Compare December 10, 2025 19:27

evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 10, 2025 19:27 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs December 10, 2025 19:28 View deployment

[sui-indexer-alt] Support pruning when not starting from genesis

eb9dd76

evan-wall-mysten force-pushed the indexer_set_reader_lo_pruner_hi branch from 7fc131b to eb9dd76 Compare December 10, 2025 19:42

evan-wall-mysten temporarily deployed to sui-typescript-aws-kms-test-env December 10, 2025 19:42 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs December 10, 2025 19:44 View deployment

evan-wall-mysten requested a review from wlmyng December 10, 2025 20:05

wlmyng approved these changes Dec 10, 2025

View reviewed changes

[indexer-alt-framework] Set checkpoint_hi_inclusive, reader_lo, pruner_hi if watermark does not exist #24523

Are you sure you want to change the base?

[indexer-alt-framework] Set checkpoint_hi_inclusive, reader_lo, pruner_hi if watermark does not exist #24523

Conversation

evan-wall-mysten commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test plan

Release notes

Uh oh!

vercel bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wlmyng left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wlmyng Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

evan-wall-mysten commented Dec 4, 2025 •

edited

Loading

vercel bot commented Dec 4, 2025 •

edited

Loading

wlmyng Dec 10, 2025 •

edited

Loading