Skip to content

test: MPC + stream component test#739

Draft
jakmeier wants to merge 7 commits intosig-net:developfrom
jakmeier:stream_component_test
Draft

test: MPC + stream component test#739
jakmeier wants to merge 7 commits intosig-net:developfrom
jakmeier:stream_component_test

Conversation

@jakmeier
Copy link
Copy Markdown
Contributor

@jakmeier jakmeier commented Apr 2, 2026

Create a MockStream that connects to the MPC fixture setup. This lets us test the indexer stream <-> MPC glue.

Copy link
Copy Markdown
Contributor

@volovyks volovyks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! test_channel_contention should be very useful in our work on backlog and stability.

Comment thread chain-signatures/node/src/cli.rs Outdated
crate::metrics::nodes::CONFIGURATION_DIGEST.set(digest);

let (sign_tx, sign_rx) = mpsc::channel(16384);
let (sign_tx, sign_rx) = mpsc::channel(if cfg!(test) { 1 } else { 16384 });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is more of an experiment, but we can make it configurable in tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but it definitely deserves a constant

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's just an experiment that I will clean up before removing the draft label

guard.sign_requests(requests)
}

pub async fn rpc_actions(&self, actions: &[RpcAction]) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear what we are doing with these actions. Are we queuing them? Processing? Adding to a specific block? I'm ok with a bit longer names. That makes the code a bit more readable IMO.

network[1].mock_streams[0].progress_block_height(1).await;
network[2].mock_streams[0].progress_block_height(1).await;

let timeout = Duration::from_secs(10);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Such a pattern significantly slows down our tests; I would add a helper that actually waits for N events or specific events with a timeout.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, doesn't this already do exactly what you ask?

let timeout = Duration::from_secs(10);
let actions = network.assert_actions(1, timeout).await;
    pub async fn assert_actions(
        &self,
        threshold_per_node: usize,
        timeout: Duration,
    ) -> HashSet<String> {
        let result = tokio::time::timeout(timeout, self.wait_for_actions(threshold_per_node)).await;
        if result.is_err() {
            self.print_actions().await;
        }
        result.expect("should produce enough signatures")
    }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but why do we have the timeout? What if it happens faster?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokio::time::timeout returns early in that case

wait_for_actions keeps polling at a 100ms interval to check if there are enough actions

@jakmeier
Copy link
Copy Markdown
Contributor Author

jakmeier commented Apr 9, 2026

I'm still experimenting here. Getting some interesting results.

  1. There is some luck involved if nodes can handle 50 signature requests at once coming from the Solana stream. However, channel buffers have no influence on that. So probably that is not problem.
  2. Meanwhile, 50 requests seems to be a hard threshold for successful signatures. To get more (e.g. 51) I have to send way more requests. I think then they have better chances, with more requests running in parallel.
  3. I often observed 25, 50 or 75 successful signatures. Don't know why, yet. Could be an artifact from the test setup.

I will take a closer look again next week.

@jakmeier
Copy link
Copy Markdown
Contributor Author

Still investigating but this is interesting.

image

I see that 25 signatures get through in the first round. Then they wait for a timeout. (3s for faster iteration in the graph but it is the same with longer timeouts) One more is accepted in the third round. This is exactly the same for all participants.

After the third round, no more signatures are produced. They always fail in the posit phase with "proposer timeout waiting for presignature, reorganizing". Could be they are just out of presignatures. Still interesting that the need timeouts to make progress.

Retry round alignment seems to works fine, too. Only a short time window where different nodes have different rounds.

@jakmeier jakmeier force-pushed the stream_component_test branch from b24050f to 5fd34dc Compare April 16, 2026 13:25
Create a MockStream that connects to the MPC fixture
setup. This lets us test the indexer stream <-> MPC glue.
but not with channel capacity?
otherwise it is always the same participant in round 0
adding useful cases and checking if they run in ci
@jakmeier jakmeier force-pushed the stream_component_test branch from 8cd5c90 to b883f27 Compare April 16, 2026 15:32
@jakmeier
Copy link
Copy Markdown
Contributor Author

The issue from my previous comment was that I didn't use good entropy in requests. Hence, at round 0, all requests went to participant 0 (first 25 requests) and after one timeout in round 2, all requests went to participant 2.

That's fixed now and it actually works rather stable, even when adding 1M requests all at once.

@jakmeier
Copy link
Copy Markdown
Contributor Author

jakmeier commented Apr 16, 2026

The new problem I was able to identify is with delayed requests. With 1.5 times the posit timeout delay between when nodes see incoming blocks, they don't get a single request signed. It looks like they fail on achieving round alignment.

full_delay

Change that to only 0.5 times the posit timeout, and all signatures are generated as expected in round 0.

half_delay

This is reproducible with the pushed test mpc_with_stream::test_channel_contention_multiple_blocks_at_once_delayed.

#[test(tokio::test(flavor = "multi_thread"))]
async fn test_channel_contention_multiple_blocks_at_once_delayed() {
// delay should be > ORGANIZE_POSIT_TIMEOUT
let delay = mpc_node::protocol::signature::organize_posit_timeout() * 3 / 2;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this to less than a full posit delay make the test pass. Hence we know we only run into the issue when observations are apart more than a posit timeout (20s).

Suggested change
let delay = mpc_node::protocol::signature::organize_posit_timeout() * 3 / 2;
let delay = mpc_node::protocol::signature::organize_posit_timeout() / 2;

@jakmeier
Copy link
Copy Markdown
Contributor Author

Somehow test_sign_no_presignature_waste now fails, as only 72 out of 75 signatures are produced.

I only changed the entropy of the signature requests. This suggests that "unlucky" proposer assignment can cause long delays, too. I tried increasing the timeout and using different seeds.

But nothing works except the entropy currently on develop. That one assigned all requests to the same participant in any given round. (round 0, proposer = Participant(0); round 1, proposer = Participant(1); round 2, proposer = Participant(2)) Apparently that's the only way we have no presignature waste 😕

@jakmeier
Copy link
Copy Markdown
Contributor Author

test_sign_no_presignature_waste fails because we eagerly start the signature when the threshold is reached. Node may then becomes proposer for signatures that are already taken care of and we waste signatures.

Fix in #765

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants