test: add invalidate fork testcases#399
Conversation
|
🤖 Claude Code Review Status: Complete This PR adds comprehensive reorg test coverage and fixes critical bugs in fork/reorg handling. The implementation quality is solid with proper context handling, buffered channels, and correct transaction tracking logic. Key Improvements:
Existing Review Comments: No new issues found. |
| func (b *BlockAssembler) handleReorg(ctx context.Context, header *model.BlockHeader, height uint32) error { | ||
| if err := ctx.Err(); err != nil { | ||
| return errors.NewProcessingError("context", err) | ||
| } |
There was a problem hiding this comment.
Context cancellation check added but reorg work continues
The early context cancellation check at line 1425 is good defensive programming. However, if the context is cancelled during reorg processing (e.g., in the middle of fetching block headers or updating subtrees), the cleanup operations may be left incomplete.
Consider: Does the reorg logic handle partial state correctly if cancelled midway? The error handling returns early, but are there any state mutations that should be rolled back?
There was a problem hiding this comment.
Why was this added? Should this not be taken care of the for/select where handleReorg is called?
| return nil, nil, errors.NewServiceError("error getting common ancestor header", err) | ||
| } | ||
| for !currentHash.IsEqual(newHash) { | ||
| if currentHeight == 0 || newHeight == 0 { |
There was a problem hiding this comment.
Common ancestor algorithm improvement
This is a significant correctness fix. The old block locator approach was inefficient and could produce incorrect results when the locator step sizes caused it to skip over the actual common ancestor.
The new approach correctly walks both chains backwards in lockstep, which guarantees finding the true common ancestor. The performance is O(depth_difference + reorg_depth) which is optimal for this problem.
Minor suggestion: The loop condition for height 0 with a break is a bit defensive. If both chains are valid and share genesis, this should never trigger. Consider logging if this happens as it may indicate a data inconsistency.
| losingTxSet[hash] = true | ||
| allLosingTxHashes = append(allLosingTxHashes, hash) | ||
| } | ||
|
|
There was a problem hiding this comment.
Critical fix: proper tracking of losing conflicting transactions
This is the core fix for the reorg double-spend handling bug. The previous code only marked transactions as on-longest-chain if they were not in the old fork, but it failed to identify and unmark transactions that LOST conflicts.
The new logic correctly:
- Builds winningTxSet from all transactions in winning chain
- Collects losingTxHashesMap from moveForwardBlock (new return value)
- Filters out any losing tx that also appears in winning set (same tx mined in both forks)
- Marks all truly-losing transactions as NOT on longest chain
This ensures that when tx2 loses to tx3 (both spending same UTXO), tx2 gets properly unmarked even though it was in the old fork.
Question: Should filteredMarkOnLongestChain also filter against transactions already marked? Or is the UTXO store idempotent for repeated mark operations?
There was a problem hiding this comment.
Good question from bot. Could be more efficient if we do not do unnecessary calls.
There was a problem hiding this comment.
Not needed as far as I can tell. I think the store is idempotent.
| initPrometheusMetrics() | ||
|
|
||
| ctx := context.Background() | ||
| ctx, cancel := context.WithCancel(context.Background()) |
There was a problem hiding this comment.
Test properly uses context cancellation
Good practice: The test now creates a cancellable context and properly defers the cancel to avoid goroutine leaks. The done channel ensures the background goroutine completes before the test exits.
This pattern should be followed in all tests that spawn goroutines.
There was a problem hiding this comment.
We should be using t.Context() everywhere. Was added in go 1.24
|
|
||
| // Build two competing chains from height 1: | ||
| // Main chain: 1 -> 2A -> 3A | ||
| // Fork chain: 1 -> 2B -> 3B (invalidated) |
There was a problem hiding this comment.
Buffered channel prevents blocking
Excellent fix: The channel is now buffered with size 100, which prevents the test from blocking when multiple subtree notifications are sent rapidly. The previous unbuffered channel could cause deadlocks if notifications arrived faster than the test goroutine could process them.
The buffer size of 100 is reasonable for test scenarios where you may have multiple incomplete subtrees being generated.
|
|
||
| require.Len(t, moveForwardBlockHeaders, 2) | ||
| assert.Equal(t, h2a.Hash(), moveForwardBlockHeaders[0].header.Hash()) | ||
| assert.Equal(t, h3a.Hash(), moveForwardBlockHeaders[1].header.Hash()) |
There was a problem hiding this comment.
New test validates invalidated fork handling
This test case covers an important edge case: when BlockAssembler is tracking an invalidated fork tip and needs to reorg back to the main chain. This scenario validates that the common ancestor algorithm works correctly even when one of the chains has been explicitly invalidated.
The test correctly:
- Simulates BlockAssembler being on the wrong fork
- Uses InvalidateBlock to mark the fork invalid
- Verifies getReorgBlockHeaders returns correct move-back and move-forward blocks
This complements the other reorg tests by focusing on the administrative invalidation case rather than natural chain reorganization.
|
|
||
| - name: Start Teranode nodes with 3 block generators (docker compose up) | ||
| run: docker compose -f compose/docker-compose-3blasters.yml up -d | ||
|
|
There was a problem hiding this comment.
Comprehensive nightly test workflow
The new workflow adds important long-running integration testing with 3 concurrent block generators. Key features:
- Error monitoring: Checks container logs for ERROR messages at 5-second intervals with timestamp tracking
- Height consensus: Waits for all 3 nodes to reach height 120+ before declaring success
- Chain integrity validation: Uses chainintegrity tool to verify log consistency across nodes
- Proper cleanup: Always runs docker compose down and uploads artifacts on failure
Minor suggestions:
- Consider making REQUIRED_HEIGHT configurable via workflow input for flexibility
- The 10-minute timeout (MAX_ATTEMPTS=120 * SLEEP=5s) may be tight for slower runners - consider documenting expected runtime
- Line 66: Empty since_param on first check is clever but could use a comment explaining why
|
| func (b *BlockAssembler) handleReorg(ctx context.Context, header *model.BlockHeader, height uint32) error { | ||
| if err := ctx.Err(); err != nil { | ||
| return errors.NewProcessingError("context", err) | ||
| } |
There was a problem hiding this comment.
Why was this added? Should this not be taken care of the for/select where handleReorg is called?
| initPrometheusMetrics() | ||
|
|
||
| ctx := context.Background() | ||
| ctx, cancel := context.WithCancel(context.Background()) |
There was a problem hiding this comment.
We should be using t.Context() everywhere. Was added in go 1.24
| losingTxSet[hash] = true | ||
| allLosingTxHashes = append(allLosingTxHashes, hash) | ||
| } | ||
|
|
There was a problem hiding this comment.
Good question from bot. Could be more efficient if we do not do unnecessary calls.
| // Ensure reorg move-back does not silently skip nodes as "duplicates". | ||
| // A tx can have an entry in currentTxMap even if it was removed from the subtrees, | ||
| // and addNode() uses SetIfNotExists which would otherwise ignore it. | ||
| stp.currentTxMap.Delete(subtreeNodes[i].Hash) |
There was a problem hiding this comment.
This would bypass the duplication check, the reason the currentTxMap exists. Is this really correct?
There was a problem hiding this comment.
Is that the reason currentTxMap exists? It would bypass duplication check but does need to be deleted? Is there another way to do that?
There was a problem hiding this comment.
Why would it needed to be deleted @liam ? It should be reset at the beginning of a reorg and therefore checks for duplications
There was a problem hiding this comment.
Where is it reset? Just in the reset() function? Is that called at the beginning of a reorg?
There was a problem hiding this comment.
This is in moveBackBlock, which is adding transactions from previous blocks back into the subtreeprocessor. It should never add a transaction that is already there. If you delete here, you circumvent the duplication check in addNode services/blockassembly/subtreeprocessor/SubtreeProcessor.go:1503
// SetIfNotExists returns (value, wasSet) where wasSet is true if the key was newly inserted
if _, wasSet := stp.currentTxMap.SetIfNotExists(node.Hash, parents); !wasSet {
// Key already existed, this is a duplicate
stp.logger.Debugf("[addNode] duplicate transaction ignored %s", node.Hash.String())
return nil
}
| } else { | ||
| for i, node := range subtreeNodes { | ||
| // Ensure reorg move-back does not silently skip nodes as "duplicates". | ||
| stp.currentTxMap.Delete(node.Hash) |
There was a problem hiding this comment.
This would bypass the duplication check, the reason the currentTxMap exists. Is this really correct?
| completeSubtrees = append(completeSubtrees, stp.chainedSubtrees...) | ||
|
|
||
| // incomplete subtrees ? | ||
| if chainedCount == 0 && stp.currentSubtree.Load().Length() > 1 { |
There was a problem hiding this comment.
Taking out chainedCount == 0 doesn't feel right - doesn't this mean it always creates a incomplete subtree copy?
I thought we only use an incomplete subtree copy when we have no full subtrees available.


No description provided.