test(op-acceptance-tests): interop startup-rework resync tests#20824
Merged
ajsutton merged 6 commits intoMay 18, 2026
Merged
Conversation
Wiz Scan Summary
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension. |
837283f to
5fdb4e8
Compare
karlfloersch
approved these changes
May 18, 2026
Contributor
|
This LGTM! |
418d4f7 to
fcbefa3
Compare
Adds two acceptance tests covering the cold-start resync path introduced by the interop startup rework: a post-activation case and a pre-activation case. Both stop the supernode, delete its entire on-disk data dir, and start a fresh instance against the same chains and virtual nodes, then assert the verifier resumes at a sane verificationStartTimestamp and drives forward. Replaces the prior `RestartInteropActivity(wipeLogsDBs bool)` test primitive — wiping only the logs DBs is no longer meaningful under the rework — with `RestartWithFreshDataDir`, which performs a full supernode Stop + data-dir wipe + Start. The sysgo SuperNode gains a long-lived tcpproxy so its externally visible RPC URL survives the restart. The old `TestSupernodeLogBackfill_HappyPath` (and its backfill/ shared helpers) is removed; its scenario is unreachable under the rework and the new tests cover the legitimate cold-start path.
- Delete startup_resync/testutil.go; constants inlined into each test file; cross-safe advance check now composes existing L2CLNode.AdvancedFn(types.CrossSafe, ...) via dsl.CheckAll. - Replace pre-restart history wait with stronger signals: the post-activation test waits for the supernode to validate a post-activation timestamp; the pre-activation test waits for two local-safe block advances so the op-node SafeDB has entries the cold-start init can consult. - Drop redundant pre-restart VerificationStartTimestamp >= activation checks and the temporary AwaitCrossSafeAdvancesPast wrapper. - Trim docstrings on the new DSL helpers and sysgo SuperNode methods.
Setup is materially identical with activation at genesis: the post-restart cold-start init path is the same, since the data-dir wipe clears the VN safe_db too and post-restart firstSafeDB sits at the current chain tip in either case. Saves ~20s of pre-restart wall time.
- Post-activation: replace the timestamp-based wait with AdvancedFn on CrossSafe — drops the dependency on RollupConfig().BlockTime and lets the test express its intent (the verifier is committing) directly. - Pre-activation: drop AdvanceTime and the post-activation validation step. The pre-activation cold-start behaviour is the whole point of the test; the verifier transition at activation is covered elsewhere. Bump activationDelay to 300s so the chain tip stays comfortably pre-activation under CI variance.
We no longer share devnets across packages, so there's no need to keep the post-activation and pre-activation cases in separate sub-packages. Both tests now live in a single startup_resync_test.go file.
- Add presets.WithUniformL2BlockTimes (and the sysgo deployer option it wraps) so a single literal sets the block time on every configured L2 chain. - Switch the post-activation test to a 1s block time with a 3s backfill depth, and wait for L2 finalized to advance several blocks on each chain before the restart. On full data-dir wipe op-node drops the derivation pipeline back to the finalized head, not L2 genesis, so SafeDB's first entry lands well past activation and the cold-start backfill window is non-trivial. Without this the backfill range always collapses to empty against a re-recorded genesis SafeDB entry. - AssertBackfillCovers now verifies the backfill repopulated the logs DB. - Drop WithTimeTravelEnabled from both tests (no longer needed) and inline preActivationDelay as 60*60 seconds so the chain stays pre-activation through the whole pre-activation test. - Rename TestSupernodeResyncResumesAtActivation_PreActivation to TestSupernodeResyncSchedulesAtActivation_PreActivation; this test only covers the scheduled-but-not-active regime.
fee2d4b to
74d5599
Compare
ajsutton
added a commit
that referenced
this pull request
May 18, 2026
Adds two acceptance tests covering the cold-start resync path introduced by the interop startup rework: a post-activation case and a pre-activation case. Both stop the supernode, delete its entire on-disk data dir, and start a fresh instance against the same chains and virtual nodes, then assert the verifier resumes at a sane verificationStartTimestamp and drives forward. Replaces the prior `RestartInteropActivity(wipeLogsDBs bool)` test primitive — wiping only the logs DBs is no longer meaningful under the rework — with `RestartWithFreshDataDir`, which performs a full supernode Stop + data-dir wipe + Start. The sysgo SuperNode gains a long-lived tcpproxy so its externally visible RPC URL survives the restart. The old `TestSupernodeLogBackfill_HappyPath` (and its backfill/ shared helpers) is removed; its scenario is unreachable under the rework and the new tests cover the legitimate cold-start path.
pull Bot
pushed a commit
to Dustin4444/optimism
that referenced
this pull request
May 19, 2026
…m#20823) * feat(op-supernode): rework interop activity startup Replace the EL-finalized-head cold-start heuristic with a deterministic verifiedDB-resume / SafeDB-first-entry model. - Resume always wins. Any committed verifiedDB entry resumes at LastTimestamp+1 with no SafeDB or chain RPC consultation. - Cold start (no verifiedDB) waits for every chain to record a first SafeDB entry, then sets verificationStartTimestamp = max(activationTimestamp, max_c first-safe-head timestamp). Wall-clock time is never consulted; chain derivation progress is the only authoritative signal relative to activation. - Backfill lower bound is max(activation, per-chain genesis time, verificationStart - depth). Hard fails if any chain cannot serve the range. reconcileLogsDBTail runs only during cold-start backfill; warm-restart paths rely on DecisionRewind for drift handling. - Start splits into a fast init plus a stateful main loop. The loop drives both cold-start init and progressAndRecord, so Start never blocks on multi-day EL sync waits and per-iteration backoff / cancellation / observability come for free. - firstVerifiableTimestamp is now a synchronous accessor backed by verifiedDB.FirstTimestamp and verificationStartTimestamp; RPC handlers return ErrNotStarted while initialization is in progress. API surface: - SafeDBReader.FirstEntry on op-node, exposed through VirtualNode and ChainContainer (FirstSafeHeadTimestamp). ChainContainer reports ErrSafeDBEmpty during cold-start wait. Tests: - New startup_test.go covers fastInit resume / cold-start, the advanceColdStartInit state machine, the cold-start backfill no-op paths, the per-chain genesis clamp, and ErrNotStarted RPC semantics. - Deletes log_backfill_test.go and the obsolete EL-finalized-head startup tests in interop_test.go — replaced by the above. * fix(op-devstack/dsl): drop removed BackfillEndTimestamp accessor FirstVerifiableTimestamp() is now the authoritative handoff marker; the BackfillEndTimestamp fallback was tied to the pre-rework startup path and the accessor no longer exists on Interop. * refactor(op-supernode/interop): clarify startup loop and tolerate transient cold-start errors - Rename fastInit to tryInitFromVerifiedDB. - Split runLoop so each iteration runs exactly one of waitForColdStartInit or progress and the loop is the only place that sleeps; each step returns the duration it wants to wait. - Inject clock.Clock so tests can drive time deterministically; replace the remaining time.Sleep/time.Now/time.Since calls. - Treat all advanceColdStartInit errors as retryable (logged + errorBackoff). Cold start races chain-container startup, so transient signals like virtual-node-not-running must not kill the activity; cold start has no ErrHistoryUnavailable path to handle. * docs(op-supernode/interop): refresh stale comments after startup rework backfillAttempts/backfillCompleted now reference advanceColdStartInit instead of the removed runLogBackfill, and InteropTestControl's accessor list swaps the removed BackfillEndTimestamp for VerificationStartTimestamp. * test(op-acceptance-tests): interop startup-rework resync tests (ethereum-optimism#20824) Adds two acceptance tests covering the cold-start resync path introduced by the interop startup rework: a post-activation case and a pre-activation case. Both stop the supernode, delete its entire on-disk data dir, and start a fresh instance against the same chains and virtual nodes, then assert the verifier resumes at a sane verificationStartTimestamp and drives forward. Replaces the prior `RestartInteropActivity(wipeLogsDBs bool)` test primitive — wiping only the logs DBs is no longer meaningful under the rework — with `RestartWithFreshDataDir`, which performs a full supernode Stop + data-dir wipe + Start. The sysgo SuperNode gains a long-lived tcpproxy so its externally visible RPC URL survives the restart. The old `TestSupernodeLogBackfill_HappyPath` (and its backfill/ shared helpers) is removed; its scenario is unreachable under the rework and the new tests cover the legitimate cold-start path. * fix(op-supernode/interop): drop unsynchronized waitingForSync read Set backfillCompleted in the resume path so the flag reflects "no more backfill planned" in every branch, and BackfillCompleted() can be a single atomic read. * test(op-supernode/interop): restore cold-start backfill edge-case coverage Adds tests for the four cases removed with log_backfill_test.go that are still load-bearing under the new model — cold-start backfill is the only path that calls reconcileLogsDBTail, and these exercise its branches: - MisalignedActivation: floor(activation / blockTime) anchor invariant. - RecoversFromOfflineReorg: stale tail, reconcile clears, backfill seals. - LeavesAheadLogsDBUnchanged: tip past endTime + canonical, no-op. - TrimsNonCanonicalAheadLogsDBAndCatchesUp: tip past endTime + divergent, reconcile rewinds then backfill catches up.
ajsutton
added a commit
that referenced
this pull request
May 19, 2026
Adds two acceptance tests covering the cold-start resync path introduced by the interop startup rework: a post-activation case and a pre-activation case. Both stop the supernode, delete its entire on-disk data dir, and start a fresh instance against the same chains and virtual nodes, then assert the verifier resumes at a sane verificationStartTimestamp and drives forward. Replaces the prior `RestartInteropActivity(wipeLogsDBs bool)` test primitive — wiping only the logs DBs is no longer meaningful under the rework — with `RestartWithFreshDataDir`, which performs a full supernode Stop + data-dir wipe + Start. The sysgo SuperNode gains a long-lived tcpproxy so its externally visible RPC URL survives the restart. The old `TestSupernodeLogBackfill_HappyPath` (and its backfill/ shared helpers) is removed; its scenario is unreachable under the rework and the new tests cover the legitimate cold-start path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds two acceptance tests for the cold-start resync path introduced by #20823 — post-activation and pre-activation variants. Each stops the supernode, deletes its full on-disk data dir, starts a fresh instance against the same chains/virtual nodes, and asserts the verifier resumes and cross-safe advances on both chains.
Replaces the old
RestartInteropActivity(wipeLogsDBs bool)test primitive (wiping only logs DBs is no longer meaningful under the rework) withRestartWithFreshDataDir— a full supernode Stop + data-dir wipe + Start. The sysgoSuperNodepicks up a long-livedtcpproxyso its RPC URL survives the restart. The oldTestSupernodeLogBackfill_HappyPathis deleted; its scenario is unreachable under the rework and is superseded by these tests.Tests 2 & 4 from the spec at
plans/interop-startup-rework-acceptance-tests.md(EL-data-wipe variants) need a dual-EL preset and will land in a follow-up PR.