Skip to content

Conversation

@dknopik
Copy link
Member

@dknopik dknopik commented Nov 11, 2025

Issue Addressed

We recently introduced a regression where we fail all handshakes. The root cause for this is that we started storing the NodeInfo in a new place but did not adjust all places to use this.

Proposed Changes

  • Remove the field Network::node_info
  • Add accessors for the node metadata
  • Adjust old references of Network::node_info to use new accessors instead

@dknopik dknopik added ready-for-review This PR is ready to be reviewed network v1.2.0 labels Nov 11, 2025
their_metadata: &NodeMetadata,
) {
if let Some(our_metadata) = &self.node_info.metadata {
if let Some(our_metadata) = self.node_metadata() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function be moved to the handshake module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a node_metadata function on the handshake module, I just added another one in network.rs as a shortcut, calling the one in the handshake module.

@diegomrsantos
Copy link
Member

@claude review this PR

@claude-code-actions-sigp
Copy link

claude-code-actions-sigp bot commented Nov 12, 2025

Claude finished @diegomrsantos's task —— View job


PR Review: Fix handshake regression

Todo List

  • Read changed files to understand the modifications
  • View git diff to see exact changes from base branch
  • Analyze code for correctness and potential issues
  • Review architectural decisions
  • Check for Rust idioms and best practices
  • Address existing review comment
  • Post comprehensive review feedback

Review Summary

This PR successfully fixes the handshake regression by consolidating NodeInfo storage in the handshake module. The changes correctly remove duplicate state and provide proper accessor methods. The implementation is sound and follows good architectural principles.

Detailed Analysis

Architectural Improvements ✅

Single Source of Truth: The PR correctly consolidates NodeInfo storage in the handshake::Behaviour struct, eliminating the duplicate Network::node_info field. This follows the DRY (Don't Repeat Yourself) principle and prevents synchronization issues.

Before (problematic):

// In Network struct
node_info: NodeInfo,  // Duplicate storage

// In AnchorBehaviour (via handshake::Behaviour)
handshake.node_info   // Original storage

After (clean):

// Only in handshake::Behaviour
node_info: NodeInfo,  // Single source of truth

Code Quality ✅

  1. Proper Encapsulation: The new accessor methods (node_metadata() and node_metadata_mut()) at both the handshake::Behaviour level (lines 140-146 in handshake/mod.rs) and Network level (lines 536-542 in network.rs) provide clean abstraction boundaries.

  2. Correct Usage: All references to the old self.node_info.metadata have been properly updated to use the new accessors:

    • Line 511 in network.rs: self.node_metadata_mut() for updating subnet subscriptions
    • Line 566 in network.rs: self.node_metadata() for reading metadata in metrics recording
  3. Initialization Flow: The NodeInfo is now correctly initialized once in AnchorBehaviour::new() (lines 165-177 in behaviour.rs) and passed to handshake::Behaviour::new(), eliminating the duplicate initialization that was in Network::try_new().

Correctness Verification ✅

  • Network ID Usage: Properly uses config.domain_type converted to string for network identification
  • Metadata Initialization: Initial subnet bitfield correctly set to all zeros ("00000000000000000000000000000000")
  • Mutability: Proper use of immutable references for reads and mutable references for subnet updates
  • Option Handling: Correct handling of Option<NodeMetadata> with proper pattern matching

Response to Review Comment

@diegomrsantos asked:

Should this function be moved to the handshake module?

Referring to record_handshake_subnet_match_metrics at line 561.

My assessment: No, the function is correctly placed in the Network module. Here's why:

  1. Metrics Responsibility: The function records network-level metrics (HANDSHAKE_SUBNET_MATCHES) that are part of the broader network monitoring infrastructure, not handshake protocol logic.

  2. Clear Separation: The handshake module should focus on the protocol implementation (request/response, verification), while the network layer handles operational concerns like metrics and peer management.

  3. Access Patterns: The function needs access to both our local metadata and the peer's metadata after a successful handshake. The network layer is the appropriate place to coordinate this comparison and record operational metrics.

  4. Current Pattern: Other metric recording functions like check_block_and_prune_peers_by_score() at line 654 follow this same pattern of being in the Network module.

Recommendation: Keep the function where it is. The new accessor methods provide clean boundaries without needing to move metric recording logic into the protocol layer.

Potential Concerns

None identified. The changes are minimal, focused, and correct.

Testing Recommendations

While the existing handshake tests in handshake/mod.rs (lines 347-440) provide good coverage of the protocol, consider adding an integration test that:

  1. Subscribes to a subnet
  2. Verifies the metadata is correctly updated via the accessor methods
  3. Confirms the updated metadata is sent during subsequent handshakes

This would verify the end-to-end flow of the regression fix.

Conclusion

Verdict: Approve

This is a well-executed fix that:

  • Eliminates duplicate state
  • Provides clean abstractions via accessor methods
  • Correctly updates all references
  • Follows Rust idioms and project conventions

The PR successfully addresses the regression where handshakes were failing due to stale NodeInfo data.


|

@diegomrsantos
Copy link
Member

diegomrsantos commented Nov 12, 2025

@claude suggest how a test like the one described below could be implemented:

While the existing handshake tests in handshake/mod.rs (lines 347-440) provide good coverage of the protocol, consider adding an integration test that:
Subscribes to a subnet
Verifies the metadata is correctly updated via the accessor methods
Confirms the updated metadata is sent during subsequent handshakes
This would verify the end-to-end flow of the regression fix.

@claude-code-actions-sigp
Copy link

Claude Code is working…

I'll analyze this and get back to you.

View job run

@claude-code-actions-sigp
Copy link

claude-code-actions-sigp bot commented Nov 13, 2025

Claude finished @diegomrsantos's task —— View job


Test Implementation Suggestions

Based on my analysis of the codebase, here's a comprehensive guide for implementing the integration test you described.


Overview

The test should verify the end-to-end flow of:

  1. Subscribing to a subnet (updating metadata)
  2. Verifying the metadata is correctly updated via accessor methods
  3. Confirming the updated metadata is sent during subsequent handshakes

Test Location

Add the test to anchor/network/src/handshake/mod.rs in the existing #[cfg(test)] module (after line 440).


Implementation Approach

Option 1: Unit Test (Simpler, Recommended)

This approach tests the handshake behavior directly without involving the full network stack.

#[tokio::test]
async fn metadata_update_propagates_to_handshake() {
    *DEBUG;

    // Create initial node info with empty subnets
    let node_info = node_info("test", "v1.0.0");
    
    let keypair1 = Keypair::generate_ed25519();
    let keypair2 = Keypair::generate_ed25519();
    
    let mut local_swarm = create_test_swarm(keypair1, node_info.clone());
    let mut remote_swarm = create_test_swarm(keypair2, node_info);

    tokio::spawn(async move {
        // Step 1: Subscribe to a subnet by updating metadata
        let subnet_id = SubnetId::new(5);
        {
            let behaviour = local_swarm.behaviour_mut();
            let metadata = behaviour.node_metadata_mut()
                .as_mut()
                .expect("metadata should exist");
            
            // Subscribe to subnet 5
            metadata.set_subscribed(subnet_id, true)
                .expect("should set subscribed");
            
            // Verify the metadata was updated correctly
            assert!(metadata.subnets.contains('1'), "subnet bitfield should be updated");
            // More specifically, check that bit 5 is set
            let subnet_bytes = hex::decode(&metadata.subnets).expect("valid hex");
            let byte_index = *subnet_id as usize / 8;
            let bit_index = *subnet_id % 8;
            assert_eq!(subnet_bytes[byte_index] & (1 << bit_index), 1 << bit_index);
        }
        
        // Step 2: Verify accessor methods return the updated metadata
        {
            let behaviour = local_swarm.behaviour();
            let metadata = behaviour.node_metadata()
                .as_ref()
                .expect("metadata should exist");
            
            let subnet_bytes = hex::decode(&metadata.subnets).expect("valid hex");
            let byte_index = *subnet_id as usize / 8;
            let bit_index = *subnet_id % 8;
            assert_eq!(subnet_bytes[byte_index] & (1 << bit_index), 1 << bit_index,
                "accessor should return updated metadata");
        }

        // Step 3: Perform handshake and verify the updated metadata is sent
        local_swarm.listen().with_memory_addr_external().await;
        remote_swarm.connect(&mut local_swarm).await;

        let ([local_event], [remote_event]): ([Event; 1], [Event; 1]) =
            drive(&mut local_swarm, &mut remote_swarm).await;

        // Verify remote peer received the updated subnet information
        match remote_event {
            Event::Completed { their_info, .. } => {
                let their_metadata = their_info.metadata
                    .expect("should have metadata");
                
                // Check that the remote peer received our updated subnet subscription
                let subnet_bytes = hex::decode(&their_metadata.subnets).expect("valid hex");
                let byte_index = *subnet_id as usize / 8;
                let bit_index = *subnet_id % 8;
                assert_eq!(
                    subnet_bytes[byte_index] & (1 << bit_index), 
                    1 << bit_index,
                    "remote peer should receive updated subnet subscription in handshake"
                );
            }
            Event::Failed { error, .. } => {
                panic!("Handshake should succeed, but got error: {:?}", error);
            }
        }

        // Verify local handshake also completed
        assert_completed(local_event, *remote_swarm.local_peer_id(), "v1.0.0");
    })
    .await
    .expect("test completed");
}

Key dependencies needed:

use subnet_service::SubnetId;  // Add to imports at top of test module

Option 2: Integration Test (More Complex)

If you want to test the full network stack integration including how Network::handle_subnet_event works:

Location: Create anchor/network/tests/subnet_metadata_integration.rs

use std::time::Duration;
use subnet_service::{SubnetEvent, SubnetId};
use tokio::time::timeout;

#[tokio::test]
async fn subnet_subscription_updates_handshake_metadata() {
    // This would require:
    // 1. Creating two full Network instances
    // 2. Sending a SubnetEvent::Join to one network
    // 3. Establishing a connection between them
    // 4. Verifying the handshake includes the updated metadata
    
    // Note: This is more complex as it requires:
    // - Mock or real subnet event channels
    // - Full network setup with all dependencies
    // - Proper async coordination
}

Why Option 2 is more complex:

  • Requires creating full Network instances (not just handshake::Behaviour)
  • Needs proper channel setup for SubnetEvent
  • Requires all network dependencies (discovery, peer manager, etc.)
  • More difficult to isolate the specific behavior being tested

Recommendation

Use Option 1 for the following reasons:

  1. Simpler and More Focused: Tests exactly what the regression fix addresses (metadata updates propagating to handshakes)
  2. Faster Execution: No full network stack overhead
  3. Easier Maintenance: Fewer dependencies and moving parts
  4. Follows Existing Patterns: Matches the style of tests already in handshake/mod.rs

The existing tests in handshake/mod.rs (lines 347-440) already demonstrate the pattern of:

  • Creating test swarms with create_test_swarm()
  • Using libp2p_swarm_test::drive() to coordinate handshakes
  • Verifying events with helper functions

Additional Test Variations

You might also want to add these related tests:

Test Multiple Subnet Updates

#[tokio::test]
async fn multiple_subnet_updates_all_propagate() {
    // Subscribe to subnets 3, 7, and 12
    // Verify all are present in handshake metadata
}

Test Subnet Unsubscribe

#[tokio::test]
async fn subnet_unsubscribe_propagates() {
    // Subscribe to subnet 5, then unsubscribe
    // Verify the bit is cleared in handshake metadata
}

Testing the Regression Fix Specifically

To directly test the regression that was fixed (where NodeInfo was stored in two places), you could add:

#[tokio::test]
async fn metadata_accessors_reference_same_storage() {
    let node_info = node_info("test", "v1.0.0");
    let keypair = Keypair::generate_ed25519();
    let mut swarm = create_test_swarm(keypair, node_info);
    
    let subnet_id = SubnetId::new(10);
    
    // Update via mutable accessor
    {
        let behaviour = swarm.behaviour_mut();
        behaviour.node_metadata_mut()
            .as_mut()
            .expect("metadata exists")
            .set_subscribed(subnet_id, true)
            .expect("should subscribe");
    }
    
    // Read via immutable accessor - should see the same update
    {
        let behaviour = swarm.behaviour();
        let metadata = behaviour.node_metadata()
            .as_ref()
            .expect("metadata exists");
        
        let subnet_bytes = hex::decode(&metadata.subnets).expect("valid hex");
        let byte_index = *subnet_id as usize / 8;
        let bit_index = *subnet_id % 8;
        
        assert_eq!(
            subnet_bytes[byte_index] & (1 << bit_index),
            1 << bit_index,
            "accessor methods should reference the same underlying storage"
        );
    }
}

Summary

The recommended approach is to add the unit test to anchor/network/src/handshake/mod.rs following the existing test patterns. This provides clear verification that:

  1. ✅ Metadata can be updated via node_metadata_mut()
  2. ✅ Updates are visible via node_metadata() accessor
  3. ✅ Updated metadata is correctly transmitted during handshakes

This directly validates the regression fix while remaining simple, fast, and maintainable.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

network ready-for-review This PR is ready to be reviewed v1.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants