-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: verify and repair evodb diffs automatically at node startup #6999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
Automatically verify and repair deterministic masternode list diffs in evodb during node startup. Helps detect and fix database corruption without manual intervention. - Add DB_LIST_REPAIRED marker to track when repair is complete - Skip repair during reindex (fresh rebuild) - Add -forceevodbrepair flag to force re-verification - Shutdown gracefully on critical errors with user instructions - Run in background thread with minimal cs_main locking Co-Authored-By: Claude Code (Anthropic) <[email protected]>
WalkthroughThree files are modified to introduce a repair-tracking mechanism for the Evo masternode list database. A new persistent repair flag ( Sequence DiagramsequenceDiagram
participant Startup as Startup Flow
participant MNMgr as CDeterministicMNManager
participant EvoDB as EvoDB (Database)
participant ChainHelper as ChainHelper
participant SpecialTx as SpecialTxMan
Startup->>MNMgr: IsRepaired()?
MNMgr->>EvoDB: Check DB_LIST_REPAIRED flag
EvoDB-->>MNMgr: Flag status
MNMgr-->>Startup: Repaired status
alt Already Repaired & !forceevodbrepair
Startup->>Startup: Skip repair, log message
else Reindexing in Progress
Startup->>MNMgr: CompleteRepair()
MNMgr->>EvoDB: Write DB_LIST_REPAIRED
MNMgr->>EvoDB: Commit & flush
else Repair Needed
Startup->>ChainHelper: Build list via callback
ChainHelper->>SpecialTx: RebuildListFromBlock
SpecialTx-->>ChainHelper: Rebuilt list
ChainHelper-->>Startup: List built
Startup->>MNMgr: RecalculateAndRepairDiffs(callback)
MNMgr->>MNMgr: Verify & repair diffs
alt Verification Errors
MNMgr-->>Startup: verification_errors
Startup->>Startup: Log errors
end
alt Repair Errors
MNMgr-->>Startup: repair_errors
Startup->>Startup: Log critical, suggest reindex
Startup->>Startup: Initiate shutdown
else Success
Startup->>MNMgr: CompleteRepair()
MNMgr->>EvoDB: Write DB_LIST_REPAIRED
MNMgr->>EvoDB: Commit & flush
Startup->>Startup: Log summary
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/evo/deterministicmns.h (1)
743-749: New repair-status API on CDeterministicMNManager looks reasonableThe addition of
IsRepaired()andCompleteRepair()is consistent with the cpp implementation and the startup repair workflow; public exposure here is appropriate.You may want to document (brief comment) that these are intended to be called only from startup/maintenance paths and that
IsRepaired()is purely a DB flag check, not a live consistency check, but that's optional.src/evo/deterministicmns.cpp (1)
1694-1706: IsRepaired/CompleteRepair semantics and error handling
IsRepaired()as a thinm_evoDb.Exists(DB_LIST_REPAIRED)wrapper is fine for a durable “have we ever completed repair” marker and matches how it’s used at startup.In
CompleteRepair():
- The transactional write +
CommitRootTransaction()pattern is consistent with other Evo DB upgrade flows.- Using
assert(false)on commit failure is stricter than the migration path (which returnsfalse). Given this is only called from startup/reindex paths, this is probably acceptable, but you might consider returning abooland letting the caller log and initiate shutdown instead of hard-asserting, for consistency withMigrateLegacyDiffs()and to keep behavior uniform across DB-maintenance routines.src/init.cpp (1)
2414-2460: Startup EvoDB repair flow is well-placed; consider a couple of small robustness tweaksThe overall flow in the
loadblkthread looks solid:
- Reindex/-reindex-chainstate: you explicitly skip repair and mark the DB as repaired via
CompleteRepair(), which avoids doing heavy work while the entire history is being rebuilt.- Normal startup:
- If
IsRepaired()and-forceevodbrepair=0, you skip repair and just log once.- Otherwise you:
- Snapshot
start_indexatDIP0003Heightandstop_indexat tip undercs_main.- Run
RecalculateAndRepairDiffswith abuild_list_functhat delegates tochain_helper->special_tx->RebuildListFromBlock, reusing the already-validated special-tx logic.- Log verification errors as warnings, treat any
repair_errorsas fatal (log +StartShutdown()), and on success callCompleteRepair()and log a concise summary with timing viaTicks().This keeps cs_main locking minimal and runs before network connections are started, so there is no concurrent chain mutation during the repair.
Two optional improvements you might consider:
- Handle the “no work to do” case explicitly
If
start_indexis null orstart_index->nHeight >= stop_index->nHeight(e.g. chain height is below DIP0003 activation on a fresh dev/regtest setup), the code silently does nothing and never sets the repaired flag. That’s not incorrect, but it means the check will rerun on every startup.You could treat this as a trivially-successful repair, e.g.:
- if (start_index && stop_index && start_index->nHeight < stop_index->nHeight) { + if (start_index && stop_index && start_index->nHeight < stop_index->nHeight) { // existing verification/repair logic... - } + } else { + LogPrintf("No masternode list diffs to verify in current height range; marking EvoDB as repaired\n"); + node.dmnman->CompleteRepair(); + }
- Defensive check for chain_helper/special_tx
The lambda assumes
node.chain_helperandnode.chain_helper->special_txare always initialized. That’s true in the normal node path afterLoadChainstate, but if any future refactoring ever makes this optional, an assert or explicit check with a clear log message here would make failures easier to diagnose.Overall, the repair logic, error handling, and placement in the startup sequence look correct and low-risk.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/evo/deterministicmns.cpp(2 hunks)src/evo/deterministicmns.h(1 hunks)src/init.cpp(3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-01-07T18:50:44.838Z
Learnt from: knst
Repo: dashpay/dash PR: 6511
File: src/evo/deterministicmns.cpp:1369-1373
Timestamp: 2025-01-07T18:50:44.838Z
Learning: The functions `MigrateDBIfNeeded` and `MigrateDBIfNeeded2` in `src/evo/deterministicmns.cpp` are temporary and will be removed in a future version. Refactoring suggestions for these functions should be avoided.
Applied to files:
src/evo/deterministicmns.hsrc/evo/deterministicmns.cpp
📚 Learning: 2025-11-13T20:02:55.480Z
Learnt from: UdjinM6
Repo: dashpay/dash PR: 6969
File: src/evo/deterministicmns.h:441-479
Timestamp: 2025-11-13T20:02:55.480Z
Learning: In `src/evo/deterministicmns.h`, the `internalId` field in `CDeterministicMN` and the `mnInternalIdMap` in `CDeterministicMNList` are non-deterministic and used only for internal bookkeeping and efficient lookups. Different nodes can assign different internalIds to the same masternode depending on their sync history. Methods like `IsEqual()` intentionally ignore internalId mappings and only compare consensus-critical deterministic fields (proTxHash, collateral, state, etc.).
Applied to files:
src/evo/deterministicmns.hsrc/evo/deterministicmns.cpp
📚 Learning: 2025-10-28T18:36:40.263Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6923
File: src/test/util/setup_common.cpp:235-251
Timestamp: 2025-10-28T18:36:40.263Z
Learning: In `src/test/util/setup_common.cpp`, the `CEvoDB` instance in `BasicTestingSetup` is constructed with `.memory = true` flag (memory-only mode), so it does not create file handles on disk. This makes the destructor teardown order safe even if `fs::remove_all(m_path_root)` is called before `m_node.evodb.reset()`.
Applied to files:
src/evo/deterministicmns.cpp
🧬 Code graph analysis (2)
src/evo/deterministicmns.h (1)
src/evo/deterministicmns.cpp (4)
IsRepaired(1694-1694)IsRepaired(1694-1694)CompleteRepair(1696-1706)CompleteRepair(1696-1696)
src/init.cpp (2)
src/node/interfaces.cpp (1)
fReindex(623-623)src/util/time.h (1)
Ticks(81-84)
🔇 Additional comments (3)
src/evo/deterministicmns.cpp (1)
32-36: DB_LIST_REPAIRED key is consistent with existing Evo DB keyingThe new
"dmn_R1"key forDB_LIST_REPAIREDfollows the existing naming/versioning pattern (dmn_S*,dmn_D*) and is scoped to this module; no issues here.src/init.cpp (2)
79-89: New Evo includes are scoped and appropriateAdding
evo/chainhelper.handevo/specialtxman.hhere matches the new startup repair usage ofnode.chain_helper->special_txand doesn’t broaden dependencies beyond init-time wiring.
756-756: -forceevodbrepair argument wiring is correctThe new
-forceevodbrepairflag is registered consistently with other debug/test options and its help text clearly indicates behavior (“Force evodb masternode list diff verification and repair on startup, even if already repaired”).No issues with the definition; it lines up with the logic later in
AppInitMain.
PastaPastaPasta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 2e5fdd8
|
Guix Automation has began to build this PR tagged as v23.0.0-devpr6999.2e5fdd83. A new comment will be made when the image is pushed. |
|
Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6999.2e5fdd83. The image should be on dockerhub soon. |
Issue being fixed or feature implemented
Automatically verify and repair deterministic masternode list diffs in evodb during node startup. Helps detect and fix database corruption without manual intervention.
What was done?
DB_LIST_REPAIREDmarker to track when repair is complete-forceevodbrepairflag to force re-verificationcs_mainlockingHow Has This Been Tested?
Run a node, check logs
Breaking Changes
n/a
Checklist: