Skip to content

Conversation

fdefelici
Copy link
Contributor

Description

This PR add block marf computation to the AAC test harness, so that we can avoid to pass it as an input.

I checked different ways to do it (even using ephemeral implementation), but in the end the one that worked best was to write on the test chainstate, retrieve the root hash and then rollback the marf transaction.

Applicable issues

  • fixes #

Additional info (benefits, drawbacks, caveats)

Checklist

  • Test coverage for new or modified code paths
  • Changelog is updated
  • Required documentation changes (e.g., docs/rpc/openapi.yaml and rpc-endpoints.md for v2 endpoints, event-dispatcher.md for new events)
  • New clarity functions have corresponding PR in clarity-benchmarking repo
  • New integration test(s) added to bitcoin-tests.yml

@fdefelici fdefelici requested review from jferrant, Jiloc and kantai October 3, 2025 14:23
@fdefelici fdefelici self-assigned this Oct 3, 2025
@fdefelici fdefelici added the aac Avoiding Accidental Consensus label Oct 3, 2025
@fdefelici fdefelici added this to the 3.2.0.0.2 milestone Oct 3, 2025
@fdefelici fdefelici marked this pull request as ready for review October 6, 2025 07:35
@fdefelici fdefelici requested review from a team as code owners October 6, 2025 07:35
@fdefelici fdefelici added the aac-testing Avoiding Accidental Consensus Testing Specific Task label Oct 6, 2025
@fdefelici fdefelici moved this to Status: In Review in Stacks Core Eng Oct 6, 2025
Copy link

codecov bot commented Oct 6, 2025

Codecov Report

❌ Patch coverage is 90.47619% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.58%. Comparing base (b5e3baf) to head (32f3533).

Files with missing lines Patch % Lines
stackslib/src/chainstate/tests/consensus.rs 90.47% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6560      +/-   ##
===========================================
+ Coverage    79.88%   80.58%   +0.69%     
===========================================
  Files          568      568              
  Lines       347203   347239      +36     
===========================================
+ Hits        277377   279814    +2437     
+ Misses       69826    67425    -2401     
Files with missing lines Coverage Δ
stackslib/src/chainstate/tests/consensus.rs 90.90% <90.47%> (-0.46%) ⬇️

... and 72 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5e3baf...32f3533. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@Jiloc Jiloc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, automatically computing the marf will definitely make these tests easier to write! Regarding the approach used, LGTM, but can't gaurantee if there is an easier way that also works for other types of txs. Maybe @kantai can share his thoughs on that.

One thought (that we already discussed offline): since we've removed the MARF from the input parameters (which is great for making the first test execution faster and cleaner), we might now want to include it in the expected output. It's still an important part of consensus, and we'll want to guarantee that a newer version of stacks-node produces the same MARF root for a block as previous versions.

parent_block_id: chain_tip.index_block_hash(),
tx_merkle_root: Sha512Trunc256Sum::from_data(&[]),
state_index_root,
state_index_root: TrieHash::from_empty_data(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe just leave a comment about why we can just pass an empty hash?

/// Computes the MARF root hash for a block.
///
/// This function is intended for use in success test cases only, where all
/// transactions are valid. In other scenarios, the computation may fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function can fail, I can imagine two scenarios where a except in here could make it a bit hard to debug what is going on:

  • When writing a new test, we might incorrectly expect a transaction to succeed, but a Clarity error causes this function to panic without context.
  • When running tests on a newer epoch/Clarity version, previously valid transactions may now fail, again leading to a panic here without a clear cause.

In both cases we are right to fail! The developer will need to update the expected output, but the panic message from this helper doesn't make it obvious what went wrong.

Could we consider returning a Result (or even a simple Result<TrieHash, String>) instead of panicking here, and then handle the panic in construct_nakamoto_block with a clearer message - e.g. "Failed to compute MARF root hash: Failure on block metadata setup - This may indicate the transaction should not be marked as successful in the test configuration."...Or something similar

@jferrant
Copy link
Contributor

jferrant commented Oct 6, 2025

I agree that the MARF should still be listed in expected outputs :D Great to see this though.

Copy link

@aaronb-stacks aaronb-stacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this PR should be updated so that marf_hash is an Option type.

My rationale is that the marf_hash is actually part of the consensus protocol. So, for creating test vectors that prevent consensus breaking changes, it's better if the marf_hash is included in the input vector. Otherwise, a change which altered that hash could pass the test vector (even though it would be a consensus breaking change).

So, for most kinds of tests that we would write here, we actually want the marf hash included explicitly in the test vector. However, there are plenty of cases where we'd want to be able to run this test harness without the marf hash. In particular, it would help during test writing and generation: when someone writes a test vector, they would create all the test blocks, execute the test with the marf hashes set to None, and then use the output to fill in the expected hashes. Then, subsequent changes to the codebase would need to continue to pass tests with those hashes. A similar pattern would be used when setting up fuzzing targets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aac Avoiding Accidental Consensus aac-testing Avoiding Accidental Consensus Testing Specific Task
Projects
Status: Status: In Review
Development

Successfully merging this pull request may close these issues.

AAC Testing: Develop Integration Test Harness for append_block in stackslib
4 participants