Skip to content

test: add invalidate fork testcases#399

Open
sugh01 wants to merge 9 commits intobsv-blockchain:mainfrom
sugh01:longtests-1
Open

test: add invalidate fork testcases#399
sugh01 wants to merge 9 commits intobsv-blockchain:mainfrom
sugh01:longtests-1

Conversation

@sugh01
Copy link
Collaborator

@sugh01 sugh01 commented Jan 15, 2026

No description provided.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 15, 2026

🤖 Claude Code Review

Status: Complete


This PR adds comprehensive reorg test coverage and fixes critical bugs in fork/reorg handling. The implementation quality is solid with proper context handling, buffered channels, and correct transaction tracking logic.

Key Improvements:

  • Reorg correctness fix (SubtreeProcessor.go:2268-2339): Properly tracks and unmarks losing conflicting transactions during reorgs
  • Common ancestor algorithm (BlockAssembler.go:1593-1608): Fixed to use hash set lookup instead of nested loops
  • Test stability (BlockAssembler_test.go): Added context cancellation, buffered channels, and proper goroutine cleanup
  • Chain integrity CI (nightly_tests.yaml): Comprehensive 3-node testing with error monitoring and artifact collection

Existing Review Comments:
All 8 inline comments from previous review remain valid and provide useful context about defensive programming patterns, edge cases, and the core correctness fix.

No new issues found.

@liam liam requested a review from freemans13 January 21, 2026 09:44
func (b *BlockAssembler) handleReorg(ctx context.Context, header *model.BlockHeader, height uint32) error {
if err := ctx.Err(); err != nil {
return errors.NewProcessingError("context", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context cancellation check added but reorg work continues

The early context cancellation check at line 1425 is good defensive programming. However, if the context is cancelled during reorg processing (e.g., in the middle of fetching block headers or updating subtrees), the cleanup operations may be left incomplete.

Consider: Does the reorg logic handle partial state correctly if cancelled midway? The error handling returns early, but are there any state mutations that should be rolled back?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this added? Should this not be taken care of the for/select where handleReorg is called?

return nil, nil, errors.NewServiceError("error getting common ancestor header", err)
}
for !currentHash.IsEqual(newHash) {
if currentHeight == 0 || newHeight == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common ancestor algorithm improvement

This is a significant correctness fix. The old block locator approach was inefficient and could produce incorrect results when the locator step sizes caused it to skip over the actual common ancestor.

The new approach correctly walks both chains backwards in lockstep, which guarantees finding the true common ancestor. The performance is O(depth_difference + reorg_depth) which is optimal for this problem.

Minor suggestion: The loop condition for height 0 with a break is a bit defensive. If both chains are valid and share genesis, this should never trigger. Consider logging if this happens as it may indicate a data inconsistency.

losingTxSet[hash] = true
allLosingTxHashes = append(allLosingTxHashes, hash)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical fix: proper tracking of losing conflicting transactions

This is the core fix for the reorg double-spend handling bug. The previous code only marked transactions as on-longest-chain if they were not in the old fork, but it failed to identify and unmark transactions that LOST conflicts.

The new logic correctly:

  1. Builds winningTxSet from all transactions in winning chain
  2. Collects losingTxHashesMap from moveForwardBlock (new return value)
  3. Filters out any losing tx that also appears in winning set (same tx mined in both forks)
  4. Marks all truly-losing transactions as NOT on longest chain

This ensures that when tx2 loses to tx3 (both spending same UTXO), tx2 gets properly unmarked even though it was in the old fork.

Question: Should filteredMarkOnLongestChain also filter against transactions already marked? Or is the UTXO store idempotent for repeated mark operations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question from bot. Could be more efficient if we do not do unnecessary calls.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed as far as I can tell. I think the store is idempotent.

initPrometheusMetrics()

ctx := context.Background()
ctx, cancel := context.WithCancel(context.Background())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test properly uses context cancellation

Good practice: The test now creates a cancellable context and properly defers the cancel to avoid goroutine leaks. The done channel ensures the background goroutine completes before the test exits.

This pattern should be followed in all tests that spawn goroutines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using t.Context() everywhere. Was added in go 1.24


// Build two competing chains from height 1:
// Main chain: 1 -> 2A -> 3A
// Fork chain: 1 -> 2B -> 3B (invalidated)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffered channel prevents blocking

Excellent fix: The channel is now buffered with size 100, which prevents the test from blocking when multiple subtree notifications are sent rapidly. The previous unbuffered channel could cause deadlocks if notifications arrived faster than the test goroutine could process them.

The buffer size of 100 is reasonable for test scenarios where you may have multiple incomplete subtrees being generated.


require.Len(t, moveForwardBlockHeaders, 2)
assert.Equal(t, h2a.Hash(), moveForwardBlockHeaders[0].header.Hash())
assert.Equal(t, h3a.Hash(), moveForwardBlockHeaders[1].header.Hash())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test validates invalidated fork handling

This test case covers an important edge case: when BlockAssembler is tracking an invalidated fork tip and needs to reorg back to the main chain. This scenario validates that the common ancestor algorithm works correctly even when one of the chains has been explicitly invalidated.

The test correctly:

  • Simulates BlockAssembler being on the wrong fork
  • Uses InvalidateBlock to mark the fork invalid
  • Verifies getReorgBlockHeaders returns correct move-back and move-forward blocks

This complements the other reorg tests by focusing on the administrative invalidation case rather than natural chain reorganization.


- name: Start Teranode nodes with 3 block generators (docker compose up)
run: docker compose -f compose/docker-compose-3blasters.yml up -d

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comprehensive nightly test workflow

The new workflow adds important long-running integration testing with 3 concurrent block generators. Key features:

  1. Error monitoring: Checks container logs for ERROR messages at 5-second intervals with timestamp tracking
  2. Height consensus: Waits for all 3 nodes to reach height 120+ before declaring success
  3. Chain integrity validation: Uses chainintegrity tool to verify log consistency across nodes
  4. Proper cleanup: Always runs docker compose down and uploads artifacts on failure

Minor suggestions:

  • Consider making REQUIRED_HEIGHT configurable via workflow input for flexibility
  • The 10-minute timeout (MAX_ATTEMPTS=120 * SLEEP=5s) may be tight for slower runners - consider documenting expected runtime
  • Line 66: Empty since_param on first check is clever but could use a comment explaining why

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
60.3% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

func (b *BlockAssembler) handleReorg(ctx context.Context, header *model.BlockHeader, height uint32) error {
if err := ctx.Err(); err != nil {
return errors.NewProcessingError("context", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this added? Should this not be taken care of the for/select where handleReorg is called?

initPrometheusMetrics()

ctx := context.Background()
ctx, cancel := context.WithCancel(context.Background())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using t.Context() everywhere. Was added in go 1.24

losingTxSet[hash] = true
allLosingTxHashes = append(allLosingTxHashes, hash)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question from bot. Could be more efficient if we do not do unnecessary calls.

// Ensure reorg move-back does not silently skip nodes as "duplicates".
// A tx can have an entry in currentTxMap even if it was removed from the subtrees,
// and addNode() uses SetIfNotExists which would otherwise ignore it.
stp.currentTxMap.Delete(subtreeNodes[i].Hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would bypass the duplication check, the reason the currentTxMap exists. Is this really correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the reason currentTxMap exists? It would bypass duplication check but does need to be deleted? Is there another way to do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it needed to be deleted @liam ? It should be reset at the beginning of a reorg and therefore checks for duplications

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is it reset? Just in the reset() function? Is that called at the beginning of a reorg?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in moveBackBlock, which is adding transactions from previous blocks back into the subtreeprocessor. It should never add a transaction that is already there. If you delete here, you circumvent the duplication check in addNode services/blockassembly/subtreeprocessor/SubtreeProcessor.go:1503

		// SetIfNotExists returns (value, wasSet) where wasSet is true if the key was newly inserted
		if _, wasSet := stp.currentTxMap.SetIfNotExists(node.Hash, parents); !wasSet {
			// Key already existed, this is a duplicate
			stp.logger.Debugf("[addNode] duplicate transaction ignored %s", node.Hash.String())

			return nil
		}

} else {
for i, node := range subtreeNodes {
// Ensure reorg move-back does not silently skip nodes as "duplicates".
stp.currentTxMap.Delete(node.Hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would bypass the duplication check, the reason the currentTxMap exists. Is this really correct?

completeSubtrees = append(completeSubtrees, stp.chainedSubtrees...)

// incomplete subtrees ?
if chainedCount == 0 && stp.currentSubtree.Load().Length() > 1 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking out chainedCount == 0 doesn't feel right - doesn't this mean it always creates a incomplete subtree copy?

I thought we only use an incomplete subtree copy when we have no full subtrees available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants