Skip to content

test(op-acceptance-tests): interop startup-rework resync tests#20824

Merged
ajsutton merged 6 commits into
aj/feat/interop-startup-reworkfrom
aj/test-interop-startup-rework-acceptance
May 18, 2026
Merged

test(op-acceptance-tests): interop startup-rework resync tests#20824
ajsutton merged 6 commits into
aj/feat/interop-startup-reworkfrom
aj/test-interop-startup-rework-acceptance

Conversation

@ajsutton
Copy link
Copy Markdown
Contributor

@ajsutton ajsutton commented May 18, 2026

Adds two acceptance tests for the cold-start resync path introduced by #20823 — post-activation and pre-activation variants. Each stops the supernode, deletes its full on-disk data dir, starts a fresh instance against the same chains/virtual nodes, and asserts the verifier resumes and cross-safe advances on both chains.

Replaces the old RestartInteropActivity(wipeLogsDBs bool) test primitive (wiping only logs DBs is no longer meaningful under the rework) with RestartWithFreshDataDir — a full supernode Stop + data-dir wipe + Start. The sysgo SuperNode picks up a long-lived tcpproxy so its RPC URL survives the restart. The old TestSupernodeLogBackfill_HappyPath is deleted; its scenario is unreachable under the rework and is superseded by these tests.

Tests 2 & 4 from the spec at plans/interop-startup-rework-acceptance-tests.md (EL-data-wipe variants) need a dual-EL preset and will land in a follow-up PR.

@ajsutton ajsutton requested a review from a team as a code owner May 18, 2026 00:56
@wiz-0f98cca50a
Copy link
Copy Markdown

wiz-0f98cca50a Bot commented May 18, 2026

Wiz Scan Summary

Scanner Findings
Vulnerability Finding Vulnerabilities -
Data Finding Sensitive Data -
Secret Finding Secrets -
IaC Misconfiguration IaC Misconfigurations -
SAST Finding SAST Findings 1 Medium
Software Management Finding Software Management Findings -
Total 1 Medium

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

Comment thread op-devstack/sysgo/l2_cl_supernode.go
@karlfloersch
Copy link
Copy Markdown
Contributor

This LGTM!

@ajsutton ajsutton force-pushed the aj/feat/interop-startup-rework branch from 418d4f7 to fcbefa3 Compare May 18, 2026 20:12
ajsutton added 6 commits May 19, 2026 06:12
Adds two acceptance tests covering the cold-start resync path introduced
by the interop startup rework: a post-activation case and a
pre-activation case. Both stop the supernode, delete its entire on-disk
data dir, and start a fresh instance against the same chains and
virtual nodes, then assert the verifier resumes at a sane
verificationStartTimestamp and drives forward.

Replaces the prior `RestartInteropActivity(wipeLogsDBs bool)` test
primitive — wiping only the logs DBs is no longer meaningful under the
rework — with `RestartWithFreshDataDir`, which performs a full
supernode Stop + data-dir wipe + Start. The sysgo SuperNode gains a
long-lived tcpproxy so its externally visible RPC URL survives the
restart. The old `TestSupernodeLogBackfill_HappyPath` (and its
backfill/ shared helpers) is removed; its scenario is unreachable under
the rework and the new tests cover the legitimate cold-start path.
- Delete startup_resync/testutil.go; constants inlined into each test
  file; cross-safe advance check now composes existing
  L2CLNode.AdvancedFn(types.CrossSafe, ...) via dsl.CheckAll.
- Replace pre-restart history wait with stronger signals: the
  post-activation test waits for the supernode to validate a
  post-activation timestamp; the pre-activation test waits for two
  local-safe block advances so the op-node SafeDB has entries the
  cold-start init can consult.
- Drop redundant pre-restart VerificationStartTimestamp >= activation
  checks and the temporary AwaitCrossSafeAdvancesPast wrapper.
- Trim docstrings on the new DSL helpers and sysgo SuperNode methods.
Setup is materially identical with activation at genesis: the
post-restart cold-start init path is the same, since the data-dir wipe
clears the VN safe_db too and post-restart firstSafeDB sits at the
current chain tip in either case. Saves ~20s of pre-restart wall time.
- Post-activation: replace the timestamp-based wait with AdvancedFn on
  CrossSafe — drops the dependency on RollupConfig().BlockTime and lets
  the test express its intent (the verifier is committing) directly.
- Pre-activation: drop AdvanceTime and the post-activation validation
  step. The pre-activation cold-start behaviour is the whole point of
  the test; the verifier transition at activation is covered elsewhere.
  Bump activationDelay to 300s so the chain tip stays comfortably
  pre-activation under CI variance.
We no longer share devnets across packages, so there's no need to keep
the post-activation and pre-activation cases in separate sub-packages.
Both tests now live in a single startup_resync_test.go file.
- Add presets.WithUniformL2BlockTimes (and the sysgo deployer option it
  wraps) so a single literal sets the block time on every configured
  L2 chain.
- Switch the post-activation test to a 1s block time with a 3s backfill
  depth, and wait for L2 finalized to advance several blocks on each
  chain before the restart. On full data-dir wipe op-node drops the
  derivation pipeline back to the finalized head, not L2 genesis, so
  SafeDB's first entry lands well past activation and the cold-start
  backfill window is non-trivial. Without this the backfill range
  always collapses to empty against a re-recorded genesis SafeDB entry.
- AssertBackfillCovers now verifies the backfill repopulated the logs DB.
- Drop WithTimeTravelEnabled from both tests (no longer needed) and
  inline preActivationDelay as 60*60 seconds so the chain stays
  pre-activation through the whole pre-activation test.
- Rename TestSupernodeResyncResumesAtActivation_PreActivation to
  TestSupernodeResyncSchedulesAtActivation_PreActivation; this test
  only covers the scheduled-but-not-active regime.
@ajsutton ajsutton force-pushed the aj/test-interop-startup-rework-acceptance branch from fee2d4b to 74d5599 Compare May 18, 2026 20:15
@ajsutton ajsutton merged commit ecdb6f9 into aj/feat/interop-startup-rework May 18, 2026
61 checks passed
@ajsutton ajsutton deleted the aj/test-interop-startup-rework-acceptance branch May 18, 2026 21:37
ajsutton added a commit that referenced this pull request May 18, 2026
Adds two acceptance tests covering the cold-start resync path introduced
by the interop startup rework: a post-activation case and a
pre-activation case. Both stop the supernode, delete its entire on-disk
data dir, and start a fresh instance against the same chains and
virtual nodes, then assert the verifier resumes at a sane
verificationStartTimestamp and drives forward.

Replaces the prior `RestartInteropActivity(wipeLogsDBs bool)` test
primitive — wiping only the logs DBs is no longer meaningful under the
rework — with `RestartWithFreshDataDir`, which performs a full
supernode Stop + data-dir wipe + Start. The sysgo SuperNode gains a
long-lived tcpproxy so its externally visible RPC URL survives the
restart. The old `TestSupernodeLogBackfill_HappyPath` (and its
backfill/ shared helpers) is removed; its scenario is unreachable under
the rework and the new tests cover the legitimate cold-start path.
pull Bot pushed a commit to Dustin4444/optimism that referenced this pull request May 19, 2026
…m#20823)

* feat(op-supernode): rework interop activity startup

Replace the EL-finalized-head cold-start heuristic with a deterministic
verifiedDB-resume / SafeDB-first-entry model.

- Resume always wins. Any committed verifiedDB entry resumes at
  LastTimestamp+1 with no SafeDB or chain RPC consultation.
- Cold start (no verifiedDB) waits for every chain to record a first
  SafeDB entry, then sets verificationStartTimestamp =
  max(activationTimestamp, max_c first-safe-head timestamp). Wall-clock
  time is never consulted; chain derivation progress is the only
  authoritative signal relative to activation.
- Backfill lower bound is max(activation, per-chain genesis time,
  verificationStart - depth). Hard fails if any chain cannot serve the
  range. reconcileLogsDBTail runs only during cold-start backfill;
  warm-restart paths rely on DecisionRewind for drift handling.
- Start splits into a fast init plus a stateful main loop. The loop
  drives both cold-start init and progressAndRecord, so Start never
  blocks on multi-day EL sync waits and per-iteration backoff /
  cancellation / observability come for free.
- firstVerifiableTimestamp is now a synchronous accessor backed by
  verifiedDB.FirstTimestamp and verificationStartTimestamp; RPC
  handlers return ErrNotStarted while initialization is in progress.

API surface:
- SafeDBReader.FirstEntry on op-node, exposed through VirtualNode and
  ChainContainer (FirstSafeHeadTimestamp). ChainContainer reports
  ErrSafeDBEmpty during cold-start wait.

Tests:
- New startup_test.go covers fastInit resume / cold-start, the
  advanceColdStartInit state machine, the cold-start backfill no-op
  paths, the per-chain genesis clamp, and ErrNotStarted RPC semantics.
- Deletes log_backfill_test.go and the obsolete EL-finalized-head
  startup tests in interop_test.go — replaced by the above.

* fix(op-devstack/dsl): drop removed BackfillEndTimestamp accessor

FirstVerifiableTimestamp() is now the authoritative handoff marker;
the BackfillEndTimestamp fallback was tied to the pre-rework startup
path and the accessor no longer exists on Interop.

* refactor(op-supernode/interop): clarify startup loop and tolerate transient cold-start errors

- Rename fastInit to tryInitFromVerifiedDB.
- Split runLoop so each iteration runs exactly one of waitForColdStartInit
  or progress and the loop is the only place that sleeps; each step returns
  the duration it wants to wait.
- Inject clock.Clock so tests can drive time deterministically; replace
  the remaining time.Sleep/time.Now/time.Since calls.
- Treat all advanceColdStartInit errors as retryable (logged + errorBackoff).
  Cold start races chain-container startup, so transient signals like
  virtual-node-not-running must not kill the activity; cold start has no
  ErrHistoryUnavailable path to handle.

* docs(op-supernode/interop): refresh stale comments after startup rework

backfillAttempts/backfillCompleted now reference advanceColdStartInit
instead of the removed runLogBackfill, and InteropTestControl's accessor
list swaps the removed BackfillEndTimestamp for VerificationStartTimestamp.

* test(op-acceptance-tests): interop startup-rework resync tests (ethereum-optimism#20824)

Adds two acceptance tests covering the cold-start resync path introduced
by the interop startup rework: a post-activation case and a
pre-activation case. Both stop the supernode, delete its entire on-disk
data dir, and start a fresh instance against the same chains and
virtual nodes, then assert the verifier resumes at a sane
verificationStartTimestamp and drives forward.

Replaces the prior `RestartInteropActivity(wipeLogsDBs bool)` test
primitive — wiping only the logs DBs is no longer meaningful under the
rework — with `RestartWithFreshDataDir`, which performs a full
supernode Stop + data-dir wipe + Start. The sysgo SuperNode gains a
long-lived tcpproxy so its externally visible RPC URL survives the
restart. The old `TestSupernodeLogBackfill_HappyPath` (and its
backfill/ shared helpers) is removed; its scenario is unreachable under
the rework and the new tests cover the legitimate cold-start path.

* fix(op-supernode/interop): drop unsynchronized waitingForSync read

Set backfillCompleted in the resume path so the flag reflects "no more
backfill planned" in every branch, and BackfillCompleted() can be a
single atomic read.

* test(op-supernode/interop): restore cold-start backfill edge-case coverage

Adds tests for the four cases removed with log_backfill_test.go that are
still load-bearing under the new model — cold-start backfill is the only
path that calls reconcileLogsDBTail, and these exercise its branches:

- MisalignedActivation: floor(activation / blockTime) anchor invariant.
- RecoversFromOfflineReorg: stale tail, reconcile clears, backfill seals.
- LeavesAheadLogsDBUnchanged: tip past endTime + canonical, no-op.
- TrimsNonCanonicalAheadLogsDBAndCatchesUp: tip past endTime + divergent,
  reconcile rewinds then backfill catches up.
ajsutton added a commit that referenced this pull request May 19, 2026
Adds two acceptance tests covering the cold-start resync path introduced
by the interop startup rework: a post-activation case and a
pre-activation case. Both stop the supernode, delete its entire on-disk
data dir, and start a fresh instance against the same chains and
virtual nodes, then assert the verifier resumes at a sane
verificationStartTimestamp and drives forward.

Replaces the prior `RestartInteropActivity(wipeLogsDBs bool)` test
primitive — wiping only the logs DBs is no longer meaningful under the
rework — with `RestartWithFreshDataDir`, which performs a full
supernode Stop + data-dir wipe + Start. The sysgo SuperNode gains a
long-lived tcpproxy so its externally visible RPC URL survives the
restart. The old `TestSupernodeLogBackfill_HappyPath` (and its
backfill/ shared helpers) is removed; its scenario is unreachable under
the rework and the new tests cover the legitimate cold-start path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants