Skip to content

Improve BFT-shard throughput and proof readiness#151

Merged
b3y0urs3lf merged 14 commits into
mainfrom
perf/bft-shard-throughput
May 21, 2026
Merged

Improve BFT-shard throughput and proof readiness#151
b3y0urs3lf merged 14 commits into
mainfrom
perf/bft-shard-throughput

Conversation

@jait91
Copy link
Copy Markdown
Contributor

@jait91 jait91 commented May 11, 2026

This PR improves BFT-shard throughput and proof-readiness under load.

Main changes:

  • Add finalization/proof-readiness timing metrics.
  • Chunk and parallelize Mongo finalization inserts.
  • Remove unused write-heavy Mongo indexes from the hot path.
  • Add configurable BFT-shard precollection with Redis-backed replay safety.
  • Default async v2 submit behavior by skipping finalized duplicate lookup.
  • Add cheap in-memory proof-not-ready handling to reduce Mongo pressure from early proof polling.
  • Improve performance-test polling/worker behavior.
  • Align BFT-sharding compose defaults with the tested perf configuration.
  • Add measured performance results in docs/aggregator-performance.md.

Notes

The index changes assume a fresh DB for this branch’s tested path. Existing Mongo databases will keep previously-created
indexes until they are dropped manually.

Unused indexes removed by this PR:

  • aggregator_records.leafIndex
  • aggregator_records.finalizedAt
  • aggregator_records.blockNumber_1_leafIndex_1
  • block_records.stateIds
  • block_records.createdAt
  • smt_nodes.hash
  • smt_nodes.createdAt

Clean DBs need no migration.

@jait91 jait91 requested a review from MastaP May 11, 2026 14:59
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant performance enhancements, including sharding support, optimized MongoDB batch inserts, and a precollection mechanism to improve throughput and latency. It also adds a /health/leader endpoint for HAProxy and updates the performance test suite. Review feedback identified a critical compilation error regarding sync.WaitGroup usage and suggested improving error handling for JSON marshaling.

Comment thread internal/storage/mongodb/batch_insert.go
Comment thread cmd/performance-test/main.go Outdated
@jait91 jait91 self-assigned this May 11, 2026
@jait91 jait91 added this to Unicity May 11, 2026
@jait91 jait91 moved this to In Dev in Unicity May 11, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a substantial throughput/proof-readiness improvement pass for the BFT-shard configuration. It introduces parallel chunked Mongo finalization writes, removes write-heavy indexes from the hot path, adds an active precollector with Redis-backed replay safety for standalone/bft-shard rounds, defaults v2 submit to skip the finalized duplicate lookup, adds a cheap in-memory "proof not ready" short-circuit to reduce Mongo polling pressure, refactors the perf test polling/scheduler, adds a leader-only health endpoint, and aligns compose defaults and docs with the tested perf configuration.

Changes:

  • New finalization insert chunking + parallel workers, removal of cold indexes, and finalize timing breakdown logging.
  • Active precollector + grace-period handoff, configurable collect window, classified leaf-add (added/duplicate/rejected), and an in-memory proofPending cache for early get_inclusion_proof.v2 requests.
  • Perf test: per-job proof scheduling, startup-probe wait, X-State-ID propagation; compose/Makefile updates aligning bft-sharding stack with tested perf settings; new /health/leader endpoint for HAProxy.

Reviewed changes

Copilot reviewed 45 out of 45 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
internal/config/config.go(_test.go) Adds new processing/database knobs and validation; defaults SKIP_DUPLICATE_CHECK=true.
internal/storage/mongodb/batch_insert.go New helper for chunked, optionally parallel InsertMany with duplicate-key tolerance.
internal/storage/mongodb/{aggregator_record,smt,connection,block_records}.go(_test.go) Wire chunked finalization inserts; in-memory leaf-index sort; trim cold indexes; remove BlockRecordsStorage.GetByStateID.
internal/storage/mongodb/index_test.go Asserts production index set after CreateIndexes.
internal/storage/redis/commitment.go(_test.go) Move pending-sweep into stream loop so live ResetPendingSweep is honored without restart; new tests.
internal/smt/thread_safe_smt_snapshot.go(_test.go) Adds AddLeavesClassified returning added/duplicate/rejected indexes.
internal/round/{leaf_add,batch_processor,precollector,round_manager,parent_round_manager,factory}.go Active precollector lifecycle, grace-period handoff, classified leaf adds, finalize timing breakdown, proof-pending cache, recovery reconciliation.
internal/round/*_test.go Tests for new precollector lifecycle, recovery reconciliation, classified adds, signature update for processMiniBatch.
internal/service/service.go(_test.go) Optional duplicate-check skip; in-memory not-ready short-circuit using GetKnownNotReadyBlock.
internal/bft/{client,client_stub,client_stub_test}.go New StartNextRoundFromPrecollector interface used after UC handling and in stub.
internal/gateway/{server,handlers_rest,handlers_rest_test}.go New /health/leader endpoint and role check.
cmd/performance-test/{main,types}.go Per-job proof scheduling, startup probe with retries, finalize-breakdown log parsing, new metrics.
Makefile, compose.yml, scripts/haproxy.cfg, scripts/mongo-init.js Compose/Makefile alignment, leader healthcheck wiring, separate per-shard Mongo, dropped indexes.
docs/aggregator-performance.md New measured perf results document.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/config/config.go
MaxCommitmentsPerRound: getEnvIntOrDefault("MAX_COMMITMENTS_PER_ROUND", 20000),
CollectPhaseDuration: getEnvDurationOrDefault("COLLECT_PHASE_DURATION", "200ms"),
CommitmentStreamBufferSize: getEnvIntOrDefault("COMMITMENT_STREAM_BUFFER_SIZE", 50000),
SkipDuplicateCheck: getEnvBoolOrDefault("SKIP_DUPLICATE_CHECK", true),
Copy link
Copy Markdown
Contributor Author

@jait91 jait91 May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional. This matches v2 async behavior: submit success only means the request was accepted for processing; the proof result is authoritative. Duplicate/idempotent submits are allowed, and double-spend detection happens via the eventual proof outcome. SKIP_DUPLICATE_CHECK=false keeps the old submit-time check available.

Comment thread scripts/mongo-init.js
Comment on lines 22 to 36
@@ -33,14 +30,10 @@ db.blocks.createIndex({ chainId: 1 });
// SMT nodes collection
db.createCollection('smt_nodes');
db.smt_nodes.createIndex({ key: 1 }, { unique: true });
db.smt_nodes.createIndex({ hash: 1 });
db.smt_nodes.createIndex({ createdAt: -1 });

// Block records collection
db.createCollection('block_records');
db.block_records.createIndex({ blockNumber: 1 }, { unique: true });
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment about dropped indexes to PR description

Comment on lines +19 to +45
result := snapshot.AddLeavesClassified(leaves)

addedCommitments := make([]*models.CertificationRequest, 0, len(result.AddedIndexes))
addedLeaves := make([]*smt.Leaf, 0, len(result.AddedIndexes))
for _, idx := range result.AddedIndexes {
addedCommitments = append(addedCommitments, commitments[idx])
addedLeaves = append(addedLeaves, leaves[idx])
}

dropped := make([]interfaces.CertificationRequestAck, 0, len(result.DuplicateIndexes)+len(result.Rejected))
for _, idx := range result.DuplicateIndexes {
dropped = append(dropped, interfaces.CertificationRequestAck{
StateID: commitments[idx].StateID,
StreamID: commitments[idx].StreamID,
})
}
for _, rejected := range result.Rejected {
log.WithContext(ctx).Warn("Rejected commitment leaf",
"path", leaves[rejected.Index].Path.String(),
"error", rejected.Err.Error())
dropped = append(dropped, interfaces.CertificationRequestAck{
StateID: commitments[rejected.Index].StateID,
StreamID: commitments[rejected.Index].StreamID,
})
}

return addedCommitments, addedLeaves, dropped
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional. Duplicate leaves are idempotent re-submissions and should not be treated as newly added commitments. ACKing/dropping them during collection avoids stale proof-pending entries and unnecessary Mongo writes; the original accepted commitment remains authoritative.

Comment thread bft-sharding-compose.yml
networks:
default:
name: aggregator-go_default
external: true
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment to compose file

Comment thread docker-compose.yml
- ./data/genesis:/genesis
healthcheck:
test: ["CMD", "nc", "-zv", "bft-root", "8000"]
test: ["CMD", "nc", "-zv", "bft-root", "8002"]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified. All affected compose files start bft-root with --rpc-server-address ...:8002, so the healthcheck is checking the configured RPC/readiness port rather than the libp2p transport port.

Comment on lines +69 to +102
for range workers {
wg.Go(func() {
for job := range jobs {
if err := ctx.Err(); err != nil {
setFirstErr(err)
continue
}
err := ignoreDuplicateInsertError(collection.InsertMany(ctx, docs[job.start:job.end], options.InsertMany().SetOrdered(false)))
setFirstErr(err)
}
})
}

queue:
for start := 0; start < len(docs); start += opts.chunkSize {
if err := ctx.Err(); err != nil {
setFirstErr(err)
break
}
if getFirstErr() != nil {
break
}
select {
case jobs <- chunk{start: start, end: min(start+opts.chunkSize, len(docs))}:
case <-ctx.Done():
setFirstErr(ctx.Err())
break queue
}
}
close(jobs)
wg.Wait()

return getFirstErr()
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented. This helper is only used for idempotent finalization writes; partial chunk writes are safe because retry/recovery can replay them and duplicate-key errors are ignored.

Comment on lines +238 to +243
if block, ok := as.roundManager.GetKnownNotReadyBlock(req.StateID); ok {
responseBlockNumber, err := proofBundleBlockNumber(as.config.Sharding.Mode, block)
if err != nil {
return nil, err
}
return emptyInclusionProofResponse(responseBlockNumber, block), nil
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional, and confirmed against the SDK behavior. Empty proof responses are treated as “proof not ready yet” and retried; the temporary block number in that response is not used for proof verification.

Copy link
Copy Markdown
Member

@MastaP MastaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few smaller comments on top of the existing reviews — each one is a small follow-up rather than a blocker. Skipping points already covered by Copilot's review (SKIP_DUPLICATE_CHECK default, index migration, GetKnownNotReadyBlock UC source).

Comment thread internal/round/round_manager.go Outdated
Comment thread internal/storage/mongodb/aggregator_record.go
Comment thread internal/gateway/handlers_rest.go Outdated
Comment thread bft-sharding-compose.yml
@jait91 jait91 requested a review from MastaP May 15, 2026 08:39
@MastaP MastaP assigned b3y0urs3lf and unassigned jait91 May 18, 2026
@MastaP MastaP moved this from In Dev to Test in Unicity May 18, 2026
@b3y0urs3lf
Copy link
Copy Markdown
Contributor

Java SDK reproduction (state-transition-sdk-java)

Confirming the same behavioral difference the TS SDK reported, traced to this PR's change:

Default async v2 submit behavior by skipping finalized duplicate lookup.

Re-spend (double-spend) detection moves from the submit layer to the proof layer. Scenarios asserting submit-time STATE_ID_EXISTS pass on main and fail on this branch.

Behavior observed

  • main Java suite ✅ passes
    • finalized-dup lookup at submit: performed
    • re-spend submit status: STATE_ID_EXISTS
    • re-spend caught at: submit
  • this PR (perf/bft-shard-throughput) Java suite ❌ 14 scenarios fail
    • finalized-dup lookup at submit: skipped
    • re-spend submit status: SUCCESS
    • re-spend caught at: inclusion proof (TRANSACTION_HASH_MISMATCH)

Reproduced on both the single-aggregator subscription deployment and the bft-shard (MSB, 2- and 16-shard) deployments built from this branch — it's the build, not the topology.
Double-spend safety is intact; the re-spend never yields a valid token, it's just rejected one layer later.

Status type: org.unicitylabs.sdk.api.CertificationStatus (returned by CertificationResponse.getStatus()). Java uses STATE_ID_EXISTS (not REQUEST_ID_EXISTS). Re-spends are built with a
fresh recipient predicate + random 32-byte stateMask (renamed from nonce in SDK PR #61) → distinct stateId → submit returns SUCCESS on this branch.

Affected scenarios (14)

  • token-4level-owner-actions.feature — Scenario Outline “Double-spend detected when reuses pre-transfer token ”, all 8 rows (T1a_pre … T4b_pre). Glue:
    TreeSteps.theAggregatorRespondsWith.
  • token-transfer-edge-cases.feature — “Stale token object cannot be reused after transfer” (1). Glue: TokenLifecycleSteps.userTriesToSubmitATransferOfTheStaleTokenTo.
  • token-split-transfer.feature — “Original token cannot be used after split burn”, “Double-spend of a split token is prevented”, “Double-spend after multi-level split is prevented”,
    “Cannot spend a token after it has been split” (4).
  • token-split-advanced.feature — “Cannot transfer original token after split” (1). Glue for split files: SplitSteps (:355, :763, :970, :989).

Java fails 14 vs the TS suite's 9 because the Java suite adds split-path re-spend scenarios. Same model mismatch, broader surface.

Expected vs actual

Expected (matches main):
Submit a NEW transfer spending an already-finalized state
→ CertificationStatus = STATE_ID_EXISTS (rejected at submit)

Actual (this branch):
Submit a NEW transfer spending an already-finalized state
→ CertificationStatus = SUCCESS (accepted at submit; dup lookup skipped)
→ request inclusion proof for that transfer
→ proof for the state carries the FIRST committed tx's hash
→ verification fails with TRANSACTION_HASH_MISMATCH (rejected at proof)

Assertion failure (TreeSteps.theAggregatorRespondsWith):
Then the aggregator responds with "STATE_ID_EXISTS"
org.opentest4j.AssertionFailedError: expected: <STATE_ID_EXISTS> but was:

For reference, double-spend-prevention.feature (both submits SUCCESS, second proof rejects with TRANSACTION_HASH_MISMATCH) passes on both builds — its glue already encodes the proof-time
model.

Run config

./gradlew bddTest --rerun-tasks
env: AGGREGATOR_URL=<aggregator/proxy endpoint>
AGGREGATOR_API_KEY= # subscription deployment only
TRUST_BASE_PATH=

i.e. On branch bdd-phase-0 state-transition-sdk-java$ AGGREGATOR_URL=http://localhost:8080 AGGREGATOR_API_KEY=sk_d04b15de50ad485b925a48500d01aab2 TRUST_BASE_PATH=/home/dmytro/Documents/Unicity/state-transition-sdk/tests/e2e/trust-base.json ./gradlew bddTest --rerun-tasks -Dcucumber.execution.parallel.enabled=true -Dcucumber.execution.parallel.config.strategy=fixed -Dcucumber.execution.parallel.config.fixed.parallelism=4 -Dcucumber.filter.tags="not @slow and not @wip and not @ignore and not @bft-shard-only and not @multi-shard-only and not @pending-src-cleanup and not @stateful and not @fresh-aggregator and not @stress"

Deterministic; independent of -Dcucumber.execution.parallel.enabled (each scenario uses its own random tokenId → no cross-scenario stateId collision). Not a transport error, not a
warm-vs-fresh-aggregator boundary case.

Questions (same as TS, both SDKs want one answer)

  1. Is skip-finalized-dup-lookup intended as the default once this lands, or config-gated (e.g. a skipFinalizedDuplicateLookup / async-v2 toggle in internal/config)? If gated, both SDK
    suites can branch on the mode.
  2. Which contract should SDK suites standardize on — submit-time STATE_ID_EXISTS, or SUCCESS + proof-time TRANSACTION_HASH_MISMATCH? Java SDK's vote: the latter — it's the only assertion
    that holds on every build/topology and it strengthens coverage (the strict submit-status assertions never exercised the proof layer).

We'll hold the Java test change until Q1 is confirmed.

TS SDK reproduction (state-transition-sdk)

Confirming the same behavioral difference, traced to this PR's change:

Default async v2 submit behavior by skipping finalized duplicate lookup.

Re-spend (double-spend) detection moves from the submit layer to the proof layer. Scenarios asserting submit-time STATE_ID_EXISTS pass on main and fail on this branch.

Behavior observed

  • main TS suite ✅ passes
    • finalized-dup lookup at submit: performed
    • re-spend submit status: STATE_ID_EXISTS
    • re-spend caught at: submit
  • this PR (perf/bft-shard-throughput) TS suite ❌ 9 scenarios fail
    • finalized-dup lookup at submit: skipped
    • re-spend submit status: SUCCESS
    • re-spend caught at: inclusion proof (TRANSACTION_HASH_MISMATCH)

Reproduced on both a single-aggregator deployment and a 2-shard bft-shard (MSB) deployment built from this branch — it's the build, not the topology. Double-spend safety is intact; the
re-spend never yields a valid token, it's just rejected one layer later.

Status type: CertificationStatus (TS enum, src/api/CertificationResponse.ts), read from CertificationResponse.status. TS uses STATE_ID_EXISTS (not REQUEST_ID_EXISTS). Re-spends
are built with a fresh recipient predicate + random 32-byte stateMask (renamed from nonce in SDK PR #110/#112) → distinct stateId → submit returns SUCCESS on this branch.

Affected scenarios (9)

  • token-4level-owner-actions.feature — Scenario Outline “Double-spend detected when <user> reuses pre-transfer token <token>”, all 8 rows (T1a_pre … T4b_pre). Glue:
    tree-owner-actions.steps.tsthe aggregator responds with "…".
  • token-transfer-edge-cases.feature — “Stale token object cannot be reused after transfer” (1). Glue: transfer-edge-cases.steps.ts + minting.steps.tsthe certification response status is "…".

TS fails 9 vs Java's 14 because the TS split-path double-spend scenarios (token-split-transfer.feature, token-split-advanced.feature: “Original token cannot be used after split burn”,
“Double-spend of a split token is prevented”, “Cannot spend a token after it has been split”, etc.) assert an SDK-side failure — TransferTransaction.create/unlock rejects because the
source token is burned (transferError !== null) — not the aggregator submit status, so they don't touch the dup-lookup path and pass on both builds. Same model mismatch, narrower
submit-status surface.

Expected vs actual

Expected (matches main):
Submit a NEW transfer spending an already-finalized state
→ CertificationStatus = STATE_ID_EXISTS (rejected at submit)

Actual (this branch):
Submit a NEW transfer spending an already-finalized state
→ CertificationStatus = SUCCESS (accepted at submit; dup lookup skipped)
→ request inclusion proof for that transfer
→ proof for the state carries the FIRST committed tx's hash
→ verification fails with TRANSACTION_HASH_MISMATCH (rejected at proof)

Assertion failure (tree-owner-actions.steps.ts / minting.steps.ts):
Then the aggregator responds with "STATE_ID_EXISTS"
AssertionError [ERR_ASSERTION]: expected 'STATE_ID_EXISTS', actual 'SUCCESS'

For reference, double-spend-prevention.feature (both submits SUCCESS, second proof rejects with TRANSACTION_HASH_MISMATCH) passes on both builds — its glue already encodes the
proof-time model.

Run config On branch feature/test-infrastructure

NODE_OPTIONS='--import tsx/esm'
AGGREGATOR_URL=<aggregator/proxy endpoint>
AGGREGATOR_API_KEY= # subscription deployment only
TRUST_BASE_PATH=
./node_modules/.bin/cucumber-js --config ''
--import 'tests/bdd/functional/support/World.ts'
--import 'tests/bdd/functional/steps/**/.steps.ts'
--format summary --parallel 4
--tags 'not @shard-load and not @stateful and not @stress and not @bft-shard-only and not @fresh-aggregator'
'tests/bdd/functional/features/.feature'
(branch feature/test-infrastructure)

Deterministic; independent of --parallel (each scenario uses its own random tokenId → no cross-scenario stateId collision). Not a transport error, not a warm-vs-fresh-aggregator boundary
case.

Questions (same as Java — both SDKs want one answer)

  1. Is skip-finalized-dup-lookup intended as the default once this lands, or config-gated (e.g. a skipFinalizedDuplicateLookup / async-v2 toggle in internal/config)? If gated, both SDK
    suites can branch on the mode.
  2. Which contract should SDK suites standardize on — submit-time STATE_ID_EXISTS, or SUCCESS + proof-time TRANSACTION_HASH_MISMATCH? TS SDK's vote: the latter — it's the only
    assertion that holds on every build/topology, and it strengthens coverage (the strict submit-status assertions never exercised the proof layer).

We'll hold the TS test change until Q1 is confirmed.

@b3y0urs3lf b3y0urs3lf moved this from Test to Todo in Unicity May 21, 2026
@jait91
Copy link
Copy Markdown
Contributor Author

jait91 commented May 21, 2026

@b3y0urs3lf
Confirmed: this is the intended async/no-finalized-duplicate-lookup behavior for this PR. Double-spend safety is still enforced, but the rejection moves from submit-time STATE_ID_EXISTS to proof-time TRANSACTION_HASH_MISMATCH.

I also checked both SDKs: this appears to affect BDD/e2e test expectations rather than core SDK handling logic. Created follow-up tasks to update the suites:

@jait91 jait91 moved this from Todo to Test in Unicity May 21, 2026
@b3y0urs3lf b3y0urs3lf merged commit cb5cf20 into main May 21, 2026
2 checks passed
@b3y0urs3lf b3y0urs3lf deleted the perf/bft-shard-throughput branch May 21, 2026 12:05
@github-project-automation github-project-automation Bot moved this from Test to Done in Unicity May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants