Skip to content

Conversation

@thlorenz
Copy link
Contributor

@thlorenz thlorenz commented Nov 6, 2025

Summary by CodeRabbit

  • New Features

    • Account-fetch metrics, undelegation counters, optional auto-airdrop for empty fee-payers, pub/sub reconnect & resubscribe, subscription introspection (counts/list), airdrop helper, test helper to query monitored-account metrics, new large-scale subscription integration test.
  • Bug Fixes

    • Safer eviction/subscription flow with rollback on failure, unsubscribe timeout to avoid hangs, reduced noisy logs for missing accounts, fixed potential underflow in retry bump calculation.
  • Improvements

    • LRU-backed subscription tracking, toggleable subscription metrics, expanded instrumentation, reconnection/backoff behavior and additional tests.

Enhances subscription management and reliability by adding metrics, and robust reconnection logic with automatic resubscription.

Details

Subscription Management Improvements

  • Atomic Subscribe/Unsubscribe: Improved subscribe logic to handle evictions and rollbacks atomically, preventing race conditions between LRU cache and pubsub client
  • Enhanced Error Handling: Better error propagation and logging for subscription failures

Metrics and Monitoring

  • Subscription Metrics: Added subscription_count method to ChainPubsubClient trait for tracking active subscriptions across all clients
  • Monitored Accounts Gauge: Updated metrics to accurately report monitored accounts count
  • Configurable Metrics: Added metrics enable/disable flag to RemoteAccountProviderConfig

Reconnection and Reliability

  • Reconnection Logic: Replaced periodic connection recycling with event-driven reconnection using the new ReconnectableClient trait
  • Fibonacci Backoff: Implemented exponential backoff with fibonacci delays for reconnection attempts (max 10 minutes)
  • Automatic Resubscription: Clients automatically resubscribe to existing subscriptions after successful reconnection
  • Improved SubMuxClient: Simplified constructor and removed deprecated recycling logic

Testing and Integration

  • Integration Test: Added 07_subscription_limits.rs to test large-scale subscription scenarios (400 accounts)
  • Reconnection Tests: Comprehensive tests for reconnection logic with failed resubscription recovery
  • Updated Test Configurations: All tests now use metrics-enabled configurations

Code Quality

  • Trait Refactoring: Introduced ReconnectableClient trait for better abstraction
  • Logging Improvements: Enhanced debug/trace logging throughout subscription and reconnection flows
  • Dependency Updates: Added ureq and url for metrics fetching in integration tests

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37d6839 and c6f75fd.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • Cargo.toml (3 hunks)
  • magicblock-processor/src/executor/processing.rs (4 hunks)
  • test-integration/Cargo.toml (2 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

  • magicblock-processor/src/executor/processing.rs
📚 Learning: 2025-11-13T09:38:43.804Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/locks.rs:64-102
Timestamp: 2025-11-13T09:38:43.804Z
Learning: In magicblock-processor's TransactionScheduler (scheduler/mod.rs line 59), the executor count is clamped to MAX_SVM_EXECUTORS (63) at initialization time, and executor IDs are assigned sequentially from 0 to count-1. This architectural guarantee ensures that executor IDs used in the bitmask-based AccountLock (scheduler/locks.rs) will always be within valid bounds for bit shifting operations, making runtime bounds checks unnecessary.

Applied to files:

  • magicblock-processor/src/executor/processing.rs
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

  • magicblock-processor/src/executor/processing.rs
📚 Learning: 2025-10-26T16:54:39.084Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: test-manual/Cargo.toml:0-0
Timestamp: 2025-10-26T16:54:39.084Z
Learning: In the magicblock-validator repository, use git branch references (not commit hashes or tags) for the helius-laserstream dependency to allow automatic updates when the branch is pushed to.

Applied to files:

  • Cargo.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run_make_ci_lint
  • GitHub Check: run_make_ci_test
  • GitHub Check: Build Project
🔇 Additional comments (6)
test-integration/Cargo.toml (2)

100-101: Approve: ureq and url dependencies for test metrics collection.

The addition of ureq 2.9.6 and url 2.5.0 as workspace dependencies is appropriate for the new metrics collection capability in test integration (as referenced in the PR summary for fetching monitored account counts). ureq 2.9.6 upgrades rustls to 0.22 and ring to 0.17, without introducing breaking changes. While ureq 2.9.6 is not the latest version (3.x or 2.12.x exist), it is suitable for test-only usage and maintains backward compatibility.


77-77: I need to gather more context about the pull request to assess stability. Let me check the PR status and details.

Use a stable released version or main branch of solana-account, not a feature branch.

The revision 8f7050a is on the feature branch bmuddha/fix/pre-cow-check rather than a stable release or main branch. The commit was made today, allowing no time for testing or validation. Feature branch dependencies introduce significant risk and should be reserved for development only—not production or CI/CD test suites.

Replace this with either:

  • A tagged release from the solana-account repository, or
  • The main branch revision if the fix is already merged

Lines affected: test-integration/Cargo.toml 77, 109

Cargo.toml (3)

94-94: Confirm intent: log feature removal enables runtime logging in release builds.

Removing the release_max_level_info feature from the log dependency (line 94) changes compile-time behavior. Libraries should avoid using the max level features because they're global and can't be changed once they're set. This change allows all log levels (debug, trace) to be available at runtime in release builds, rather than being stripped at compile time.

Confirm:

  • Is this intentional to support enhanced observability for the new reconnection and metrics features described in the PR?
  • Are there any performance or binary-size implications in production that should be documented?

218-218: Confirm url dependency is shared across workspaces.

The url = "2.5.0" dependency is already declared in the main workspace dependencies (line 218), so test-integration will correctly reuse it. This is consistent with the dependency management pattern and avoids duplication.


154-154: Solana-account revision 8f7050a is consistently referenced across all manifests and exists in the repository.

Verification confirms that revision 8f7050a is uniformly specified in:

  • Main Cargo.toml workspace dependencies (line 154)
  • Main Cargo.toml patch.crates-io (line 230)
  • test-integration/Cargo.toml (lines 77 and 109)

The revision exists in the solana-account repository on branch bmuddha/fix/pre-cow-check and is associated with PR #20. All references are consistent across manifests.

magicblock-processor/src/executor/processing.rs (1)

10-21: New imports align with downstream usage

Importing ReadableAccount for lamports() and TransactionError for the new gasless guard is consistent with later code; no issues here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e66fa2 and a7a1eaa.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • test-integration/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • magicblock-chainlink/src/chainlink/mod.rs (6 hunks)
  • magicblock-processor/Cargo.toml (1 hunks)
  • magicblock-processor/src/executor/processing.rs (5 hunks)
  • magicblock-processor/tests/fees.rs (2 hunks)
  • test-kit/src/lib.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

  • magicblock-chainlink/src/chainlink/mod.rs
  • magicblock-processor/src/executor/processing.rs
📚 Learning: 2025-10-14T09:56:14.047Z
Learnt from: taco-paco
Repo: magicblock-labs/magicblock-validator PR: 564
File: test-integration/programs/flexi-counter/src/processor/call_handler.rs:122-125
Timestamp: 2025-10-14T09:56:14.047Z
Learning: The file test-integration/programs/flexi-counter/src/processor/call_handler.rs contains a test smart contract used for integration testing, not production code.

Applied to files:

  • magicblock-chainlink/src/chainlink/mod.rs
  • magicblock-processor/tests/fees.rs
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

  • magicblock-chainlink/src/chainlink/mod.rs
  • magicblock-processor/src/executor/processing.rs
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

  • magicblock-chainlink/src/chainlink/mod.rs
📚 Learning: 2025-11-13T09:38:43.804Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/locks.rs:64-102
Timestamp: 2025-11-13T09:38:43.804Z
Learning: In magicblock-processor's TransactionScheduler (scheduler/mod.rs line 59), the executor count is clamped to MAX_SVM_EXECUTORS (63) at initialization time, and executor IDs are assigned sequentially from 0 to count-1. This architectural guarantee ensures that executor IDs used in the bitmask-based AccountLock (scheduler/locks.rs) will always be within valid bounds for bit shifting operations, making runtime bounds checks unnecessary.

Applied to files:

  • magicblock-processor/src/executor/processing.rs
📚 Learning: 2025-10-28T13:15:42.706Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 596
File: magicblock-processor/src/scheduler.rs:1-1
Timestamp: 2025-10-28T13:15:42.706Z
Learning: In magicblock-processor, transaction indexes were always set to 0 even before the changes in PR #596. The proper transaction indexing within slots will be addressed during the planned ledger rewrite.

Applied to files:

  • magicblock-processor/src/executor/processing.rs
📚 Learning: 2025-10-21T10:34:59.140Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-accounts-db/src/lib.rs:63-72
Timestamp: 2025-10-21T10:34:59.140Z
Learning: In magicblock-validator, the AccountsDb "stop-the-world" synchronizer is managed at the processor/executor level, not at the AccountsDb API level. Transaction executors in magicblock-processor hold a read lock (sync.read()) for the duration of each slot and release it only at slot boundaries, ensuring all account writes happen under the read lock. Snapshot operations acquire a write lock, blocking until all executors release their read locks. This pattern ensures mutual exclusion between writes and snapshots without requiring read guards in AccountsDb write APIs.

Applied to files:

  • magicblock-processor/src/executor/processing.rs
📚 Learning: 2025-11-07T13:09:52.253Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: test-kit/src/lib.rs:275-0
Timestamp: 2025-11-07T13:09:52.253Z
Learning: In test-kit, the transaction scheduler in ExecutionTestEnv is not expected to shut down during tests. Therefore, using `.unwrap()` in test helper methods like `schedule_transaction` is acceptable and will not cause issues in the test environment.

Applied to files:

  • test-kit/src/lib.rs
  • magicblock-processor/tests/fees.rs
🧬 Code graph analysis (2)
magicblock-chainlink/src/chainlink/mod.rs (5)
magicblock-chainlink/src/chainlink/fetch_cloner.rs (1)
  • new (134-158)
magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (1)
  • blacklisted_accounts (6-30)
programs/magicblock/src/mutate_accounts/account_mod_data.rs (1)
  • id (128-135)
magicblock-processor/tests/fees.rs (1)
  • ephemeral_balance_pda_from_payer (20-26)
magicblock-metrics/src/metrics/mod.rs (1)
  • inc_undelegation_requested (454-456)
magicblock-processor/tests/fees.rs (1)
test-kit/src/lib.rs (3)
  • new_with_fee (98-146)
  • new (80-82)
  • new_with_payer_and_fees (84-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run_make_ci_lint
  • GitHub Check: Build Project
  • GitHub Check: run_make_ci_test
🔇 Additional comments (10)
magicblock-processor/Cargo.toml (1)

44-44: Dependency is justified and actively used in tests.

The verification confirms solana-keypair is imported and actively used in magicblock-processor/tests/fees.rs. The Keypair::new() calls at lines 329 and 368 demonstrate legitimate usage for creating test keypairs, confirming this is not a leftover or unnecessary dependency.

magicblock-processor/tests/fees.rs (3)

4-4: LGTM: Imports are necessary for the new tests.

Both imports are used in the new test functions below.

Also applies to: 6-6


316-361: LGTM: Test logic is correct.

This test properly verifies that gasless mode can handle transactions with non-existent accounts in the instruction's account metas. The test structure follows the established pattern and correctly verifies both success and zero fee charging.


366-408: LGTM: Test correctly verifies gasless mode with non-existent fee payer.

This test provides valuable edge case coverage by verifying that gasless mode can handle transactions where the fee payer account doesn't exist in the accounts database. The use of unwrap_or_default() correctly handles the missing account case, and the test structure follows established patterns.

magicblock-chainlink/src/chainlink/mod.rs (6)

1-4: LGTM: Import consolidation.

The consolidated import of atomic primitives and Arc is clean and appropriate for the new counter functionality.


143-147: LGTM: Atomic counters for account classification.

The AtomicU64 counters appropriately track different account categories during the removal operation, providing detailed instrumentation for the reset operation.


149-173: LGTM: Account classification logic is correct.

The removal logic properly categorizes accounts into mutually exclusive groups with appropriate atomic counter increments. The special handling for feature-owned accounts (excluding them from the empty count) appears intentional for system account considerations.


325-340: LGTM: Enhanced trace logging with mark_empty_if_not_found details.

The trace logging correctly includes the mark_empty_if_not_found parameter details, improving observability. The log level guard and macro are properly aligned.


367-380: LGTM: Undelegation tracking with metrics and proper logging.

The undelegation flow correctly increments metrics at the request point and uses appropriate debug-level logging for operational visibility. The subscription setup and success confirmation are well-structured.


250-259: No issues found — function signature verification confirms correct usage.

The verification confirms that dlp::pda::ephemeral_balance_pda_from_payer takes two parameters (payer and an index), and line 255 correctly calls it with (feepayer, 0). Multiple usages across the codebase confirm this pattern. The fee payer handling logic for gasless transactions is correct.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7a1eaa and 9118600.

📒 Files selected for processing (1)
  • magicblock-chainlink/src/remote_account_provider/lru_cache.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

  • magicblock-chainlink/src/remote_account_provider/lru_cache.rs
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

  • magicblock-chainlink/src/remote_account_provider/lru_cache.rs
🧬 Code graph analysis (1)
magicblock-chainlink/src/remote_account_provider/lru_cache.rs (2)
magicblock-metrics/src/metrics/mod.rs (1)
  • inc_evicted_accounts_count (403-405)
magicblock-chainlink/src/remote_account_provider/mod.rs (1)
  • new (275-330)
🔇 Additional comments (2)
magicblock-chainlink/src/remote_account_provider/lru_cache.rs (2)

5-5: LGTM: Metric tracking on eviction.

The eviction counter provides useful observability for subscription management. The metric increment is correctly placed after the lock is released, avoiding any performance impact.

Also applies to: 83-83


119-137: LGTM: Cache state accessors for metrics.

These three methods (len(), never_evicted_accounts(), pubkeys()) correctly expose cache state for subscription metrics. The pubkeys() method holds the lock while iterating, which is acceptable since it's only called periodically by the metrics updater (when enable_subscription_metrics is true), not in hot paths.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
magicblock-chainlink/src/chainlink/fetch_cloner.rs (1)

961-1015: Waiters may return Ok even when the actual fetch failed (error is dropped).

If fetch_new.is_empty(), this call only waits on await_pending, sets result to Ok(empty), and returns it; any error from the in-flight fetch isn’t propagated to waiters.

Minimal, safe fix: after waiting, recompute the result against the full filtered set (bank hits resolve locally; no duplicate network fetches), so waiters see the same outcome.

-        let pubkeys = pubkeys
-            .iter()
-            .filter(|p| !self.blacklisted_accounts.contains(p))
-            .collect::<Vec<_>>();
+        // Work with owned keys to allow recomputation for waiters
+        let filtered_pubkeys: Vec<Pubkey> = pubkeys
+            .iter()
+            .filter(|p| !self.blacklisted_accounts.contains(p))
+            .copied()
+            .collect();

         let mut await_pending = vec![];
         let mut fetch_new = vec![];

-        {
+        {
             let mut pending = self
                 .pending_requests
                 .lock()
                 .expect("pending_requests lock poisoned");

-            for pubkey in pubkeys {
+            for pubkey in &filtered_pubkeys {
                 // Check synchronously if account is in bank and subscribed when it should be
                 if let Some(account_in_bank) =
                     self.accounts_bank.get_account(pubkey)
                 {
@@
-                if let Some(requests) = pending.get_mut(pubkey) {
+                if let Some(requests) = pending.get_mut(pubkey) {
                     let (sender, receiver) = oneshot::channel();
                     requests.push(sender);
-                    await_pending.push((*pubkey, receiver));
+                    await_pending.push((*pubkey, receiver));
                     continue;
                 }
@@
-                fetch_new.push(*pubkey);
+                fetch_new.push(*pubkey);
             }
         }

         // If we have accounts to fetch, delegate to the existing implementation
         // but notify all pending requests when done
-        let result = if !fetch_new.is_empty() {
-            self.fetch_and_clone_accounts(
-                &fetch_new,
-                mark_empty_if_not_found,
-                slot,
-            )
-            .await
-        } else {
-            Ok(FetchAndCloneResult {
-                not_found_on_chain: vec![],
-                missing_delegation_record: vec![],
-            })
-        };
+        let result = if !fetch_new.is_empty() {
+            self.fetch_and_clone_accounts(&fetch_new, mark_empty_if_not_found, slot).await
+        } else {
+            // Another task fetched these accounts; recompute outcome cheaply against the full set
+            self.fetch_and_clone_accounts(&filtered_pubkeys, mark_empty_if_not_found, slot).await
+        };
@@
-        for (pubkey, receiver) in await_pending {
+        for (pubkey, receiver) in await_pending {
             joinset.spawn(async move {
                 if let Err(err) = receiver
                     .await
                     .inspect_err(|err| {
                         warn!("FetchCloner::clone_accounts - RecvError occurred while awaiting account {}: {err:?}. This indicates the account fetch sender was dropped without sending a value.", pubkey);
                     })
                 {
                     // The sender was dropped, likely due to an error in the other request
                     error!(
                         "Failed to receive account from pending request: {err}"
                     );
                 }
             });
         }

Follow-up (optional, future-proof): carry a small status in the oneshot (e.g., Result<(), ()>) so waiters can short‑circuit on error without recomputation.

Also applies to: 1023-1037, 1062-1076

♻️ Duplicate comments (2)
magicblock-chainlink/src/chainlink/fetch_cloner.rs (2)

984-1003: Redundant else-if: condition is tautological.

Once the OR chain didn’t continue, !self.is_watching(pubkey) is guaranteed; simplify to else with the debug log.

-                    } else if !self.is_watching(pubkey) {
+                    } else {
                         debug!("Account {pubkey} should be watched but wasn't");
                     }

219-237: Tighten undelegation-completed detection to avoid false positives.

Also require the update to be non-delegated before incrementing, for correctness under edge cases where ownership flips but delegation flag wasn’t cleared yet.

Apply:

-                    if let Some(in_bank) =
-                        self.accounts_bank.get_account(&pubkey)
-                    {
-                        if in_bank.delegated()
-                            && in_bank.owner().eq(&dlp::id())
-                            && !account.owner().eq(&dlp::id())
-                        {
+                    if let Some(in_bank) = self.accounts_bank.get_account(&pubkey) {
+                        if in_bank.delegated()
+                            && in_bank.owner().eq(&dlp::id())
+                            && !account.delegated()
+                            && !account.owner().eq(&dlp::id())
+                        {
                             debug!(
                                 "Undelegation completed for account: {pubkey}"
                             );
                             magicblock_metrics::metrics::inc_undelegation_completed();
                         }
                     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9118600 and 308c1ff.

📒 Files selected for processing (9)
  • magicblock-aperture/src/tests.rs (1 hunks)
  • magicblock-aperture/tests/setup.rs (1 hunks)
  • magicblock-api/src/magic_validator.rs (1 hunks)
  • magicblock-chainlink/src/chainlink/fetch_cloner.rs (7 hunks)
  • magicblock-chainlink/src/chainlink/mod.rs (11 hunks)
  • magicblock-chainlink/tests/utils/test_context.rs (1 hunks)
  • test-integration/test-chainlink/src/ixtest_context.rs (1 hunks)
  • test-integration/test-chainlink/src/test_context.rs (2 hunks)
  • test-integration/test-config/tests/auto_airdrop_feepayer.rs (0 hunks)
💤 Files with no reviewable changes (1)
  • test-integration/test-config/tests/auto_airdrop_feepayer.rs
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
📚 Learning: 2025-10-14T09:56:14.047Z
Learnt from: taco-paco
Repo: magicblock-labs/magicblock-validator PR: 564
File: test-integration/programs/flexi-counter/src/processor/call_handler.rs:122-125
Timestamp: 2025-10-14T09:56:14.047Z
Learning: The file test-integration/programs/flexi-counter/src/processor/call_handler.rs contains a test smart contract used for integration testing, not production code.

Applied to files:

  • test-integration/test-chainlink/src/ixtest_context.rs
  • magicblock-aperture/src/tests.rs
  • magicblock-aperture/tests/setup.rs
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

  • test-integration/test-chainlink/src/test_context.rs
  • magicblock-chainlink/src/chainlink/mod.rs
  • magicblock-chainlink/src/chainlink/fetch_cloner.rs
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

  • test-integration/test-chainlink/src/test_context.rs
  • magicblock-chainlink/src/chainlink/mod.rs
  • magicblock-chainlink/src/chainlink/fetch_cloner.rs
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

  • magicblock-chainlink/src/chainlink/mod.rs
📚 Learning: 2025-10-26T16:53:29.820Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: magicblock-chainlink/src/remote_account_provider/mod.rs:134-0
Timestamp: 2025-10-26T16:53:29.820Z
Learning: In magicblock-chainlink/src/remote_account_provider/mod.rs, the `Endpoint::separate_pubsub_url_and_api_key()` method uses `split_once("?api-key=")` because the api-key parameter is always the only query parameter right after `?`. No additional query parameter parsing is needed for this use case.

Applied to files:

  • magicblock-chainlink/src/chainlink/fetch_cloner.rs
🧬 Code graph analysis (3)
test-integration/test-chainlink/src/test_context.rs (2)
magicblock-chainlink/src/remote_account_provider/config.rs (2)
  • try_new_with_metrics (27-42)
  • lifecycle_mode (51-53)
magicblock-chainlink/src/remote_account_provider/mod.rs (3)
  • try_from_clients_and_mode (179-198)
  • rpc_client (1060-1062)
  • rpc_client (1072-1074)
magicblock-chainlink/src/chainlink/mod.rs (4)
magicblock-chainlink/src/chainlink/fetch_cloner.rs (1)
  • new (135-159)
magicblock-chainlink/src/remote_account_provider/mod.rs (1)
  • new (275-330)
magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (1)
  • blacklisted_accounts (6-30)
magicblock-metrics/src/metrics/mod.rs (1)
  • inc_undelegation_requested (454-456)
magicblock-chainlink/src/chainlink/fetch_cloner.rs (2)
magicblock-metrics/src/metrics/mod.rs (1)
  • inc_undelegation_completed (458-460)
magicblock-chainlink/src/remote_account_provider/config.rs (1)
  • try_new_with_metrics (27-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run_make_ci_lint
  • GitHub Check: run_make_ci_test
  • GitHub Check: Build Project
🔇 Additional comments (14)
magicblock-aperture/tests/setup.rs (1)

65-66: Constructor arity update looks correct.

Passing 0 for the new auto_airdrop_lamports keeps tests deterministic. No issues.

magicblock-chainlink/tests/utils/test_context.rs (2)

70-76: Good: explicit RemoteAccountProviderConfig with metrics flag.

Capacity > 0 and metrics disabled are appropriate for tests. LGTM.


104-110: Chainlink::try_new arity change applied correctly.

Zero airdrop in tests is appropriate. LGTM.

magicblock-chainlink/src/chainlink/fetch_cloner.rs (1)

1566-1573: Tests: config via try_new_with_metrics is good.

Explicit capacity and metrics flag improve clarity. LGTM.

magicblock-chainlink/src/chainlink/mod.rs (5)

64-70: Constructor arity and field wiring: LGTM.

New auto_airdrop_lamports is cleanly plumbed and stored.

Also applies to: 87-88


91-101: Endpoints constructor update: LGTM.

Forwarding auto_airdrop_lamports keeps both constructors consistent.

Also applies to: 134-141


367-375: Trace includes mark_empty set: LGTM.

The guard avoids string building at lower levels. Good hygiene.


403-406: Undelegation instrumentation/log levels: LGTM.

Switch to debug and metric increment is appropriate.

Also applies to: 415-416


258-267: Gate auto-airdrop on lifecycle mode to prevent unintended funding in non-ephemeral clusters.

The concern is valid: auto-airdrop with only auto_airdrop_lamports > 0 guard risks funding feepayers in Offline or Replica modes. However, there are two issues to resolve:

  1. Location claim is incorrect: The suggested fix applies to ONE location (lines 275–295), not "268–273, 274–299". Lines 268–273 handle empty-if-not-found marking for the fee payer itself, which is separate.

  2. Lifecycle accessibility is unclear: The current code doesn't expose LifecycleMode from the fetch_cloner. You'll need to verify:

    • Can RemoteAccountProvider expose its lifecycle_mode through a public getter?
    • Or should LifecycleMode be stored directly in the Chainlink struct?
    • Or embedded in ChainlinkConfig and passed during construction?

Confirm the implementation path, then apply the guard at the single airdrop location (lines 275–295): add && matches!(lifecycle_from_fetch_cloner(), LifecycleMode::Ephemeral) (or equivalent per your architecture decision).

magicblock-aperture/src/tests.rs (1)

45-46: Constructor arity update looks correct.

Tests keep airdrop disabled. LGTM.

test-integration/test-chainlink/src/ixtest_context.rs (1)

139-145: Constructor update: LGTM.

Passing 0 keeps behavior unchanged in these tests.

test-integration/test-chainlink/src/test_context.rs (2)

70-76: Explicit RAP config with metrics flag: LGTM.

Clearer than default path; capacity is sane; metrics disabled for tests.


108-113: Chainlink::try_new arity aligned.

Zero airdrop keeps test behavior stable. LGTM.

magicblock-api/src/magic_validator.rs (1)

418-426: Propagating auto_airdrop_lamports: verified safe.

Verification confirms defaults are correctly 0 for non-Ephemeral deployments. The field has #[serde(default)] defaulting to 0, production configurations explicitly set it to 0, and the chainlink code guards activation with if self.auto_airdrop_lamports > 0. No risk of unintended airdrop in persistent deployments. Wiring at line 425 is correct.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
magicblock-metrics/src/metrics/mod.rs (1)

178-209: Past review comment remains unaddressed.

A previous review comment suggested adding clarifying documentation for the ACCOUNT_FETCHES_* metrics to explain the relationship between SUCCESS/FAILED and FOUND/NOT_FOUND dimensions. This would help metric consumers understand that a successful RPC call can increment both SUCCESS and NOT_FOUND.

magicblock-chainlink/src/remote_account_provider/mod.rs (1)

737-747: Consider restoring the evicted entry when unsubscribe fails.

The rollback correctly removes the newly added pubkey from the LRU cache when unsubscribe(evicted) fails. However, the evicted entry is lost from the cache even though we didn't successfully unsubscribe from it.

As suggested in a past review comment, consider re-adding the evicted entry to restore the previous LRU state:

 if let Err(err) = self.pubsub_client.unsubscribe(evicted).await {
     warn!(
         "Failed to unsubscribe from pubsub for evicted account {evicted}: {err:?}");
     // Rollback the LRU add since eviction failed
     self.lrucache_subscribed_accounts.remove(pubkey);
+    self.lrucache_subscribed_accounts.add(evicted);
     return Err(err);
 }

Because we just removed pubkey, capacity is available, so re-adding evicted won't cause another eviction and restores the cache contents.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 308c1ff and 91683be.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • test-integration/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • magicblock-chainlink/src/remote_account_provider/mod.rs (28 hunks)
  • magicblock-committor-service/src/intent_executor/task_info_fetcher.rs (2 hunks)
  • magicblock-metrics/src/metrics/mod.rs (9 hunks)
  • magicblock-table-mania/Cargo.toml (1 hunks)
  • magicblock-table-mania/src/lookup_table_rc.rs (2 hunks)
  • magicblock-table-mania/src/manager.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

  • magicblock-table-mania/src/lookup_table_rc.rs
  • magicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

  • magicblock-chainlink/src/remote_account_provider/mod.rs
  • magicblock-metrics/src/metrics/mod.rs
📚 Learning: 2025-10-26T16:53:29.820Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: magicblock-chainlink/src/remote_account_provider/mod.rs:134-0
Timestamp: 2025-10-26T16:53:29.820Z
Learning: In magicblock-chainlink/src/remote_account_provider/mod.rs, the `Endpoint::separate_pubsub_url_and_api_key()` method uses `split_once("?api-key=")` because the api-key parameter is always the only query parameter right after `?`. No additional query parameter parsing is needed for this use case.

Applied to files:

  • magicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

  • magicblock-chainlink/src/remote_account_provider/mod.rs
🧬 Code graph analysis (5)
magicblock-table-mania/src/manager.rs (1)
magicblock-metrics/src/metrics/mod.rs (1)
  • inc_table_mania_a_count (491-493)
magicblock-committor-service/src/intent_executor/task_info_fetcher.rs (1)
magicblock-metrics/src/metrics/mod.rs (1)
  • inc_task_info_fetcher_a_count (487-489)
magicblock-table-mania/src/lookup_table_rc.rs (1)
magicblock-metrics/src/metrics/mod.rs (1)
  • inc_table_mania_cloase_a_count (495-497)
magicblock-chainlink/src/remote_account_provider/mod.rs (6)
magicblock-metrics/src/metrics/mod.rs (6)
  • inc_account_fetches_failed (463-465)
  • inc_account_fetches_found (467-469)
  • inc_account_fetches_not_found (471-473)
  • inc_account_fetches_success (459-461)
  • set_monitored_accounts_count (421-423)
  • inc_remote_account_provider_a_count (483-485)
magicblock-chainlink/src/remote_account_provider/lru_cache.rs (4)
  • new (26-35)
  • pubkeys (39-42)
  • pubkeys (131-137)
  • len (119-125)
magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs (3)
  • new (46-54)
  • new (311-322)
  • try_new_from_url (165-180)
magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs (1)
  • new (126-158)
magicblock-chainlink/src/submux/mod.rs (1)
  • new (147-158)
magicblock-chainlink/src/remote_account_provider/config.rs (2)
  • default (65-72)
  • try_new_with_metrics (27-42)
magicblock-metrics/src/metrics/mod.rs (4)
magicblock-chainlink/src/remote_account_provider/mod.rs (1)
  • new (278-333)
magicblock-committor-service/src/intent_executor/task_info_fetcher.rs (1)
  • new (52-60)
magicblock-table-mania/src/lookup_table_rc.rs (1)
  • new (48-55)
magicblock-table-mania/src/manager.rs (1)
  • new (64-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build Project
  • GitHub Check: run_make_ci_lint
🔇 Additional comments (10)
magicblock-table-mania/Cargo.toml (1)

17-17: LGTM!

The addition of the magicblock-metrics workspace dependency enables metrics collection for table-mania components, aligning with the PR's observability objectives.

magicblock-committor-service/src/intent_executor/task_info_fetcher.rs (1)

11-11: LGTM!

The metrics instrumentation is correctly placed before the get_multiple_accounts RPC call, providing visibility into fetch operations.

Also applies to: 120-120

magicblock-table-mania/src/manager.rs (1)

11-11: LGTM!

The metrics instrumentation correctly tracks get_multiple_accounts_with_commitment calls in the remote table fetch loop, providing visibility into table lookup operations.

Also applies to: 530-530

magicblock-chainlink/src/remote_account_provider/mod.rs (7)

1-77: LGTM: Imports and type definitions are well-structured.

The addition of hash_map::Entry for the Entry API, metrics imports for observability, and the FetchResult type alias for error propagation align well with the PR's objectives.


203-273: LGTM: Background metrics updater provides valuable reconciliation.

The periodic task correctly tracks subscription count discrepancies between the LRU cache and pubsub client, with detailed debug logging for troubleshooting.

A past review comment suggested adding a CancellationToken for graceful shutdown (currently the task is aborted on drop). This remains an optional future enhancement if clean shutdown becomes a requirement.


637-644: Excellent fix: Entry API prevents dropping concurrent fetch waiters.

This correctly addresses the previous critical issue where insert was replacing in-flight entries and dropping earlier waiters. Using the Entry API ensures all concurrent try_get_multi callers waiting for the same account will receive the fetch result.


664-682: Nested Result handling is correct; consider formatting the long warning.

The logic correctly handles both the receiver error (RecvError) and the inner fetch error (RemoteAccountProviderError).

As noted in a past review comment, the warning at lines 674-675 is very long. Consider splitting it across multiple lines for readability (optional improvement).


852-1020: LGTM: Metrics integration and error handling are well-implemented.

The notify_error helper and retry macro provide clean error handling. Metrics are correctly updated for both success and failure paths:

  • inc_account_fetches_success, inc_account_fetches_found, inc_account_fetches_not_found on success
  • inc_account_fetches_failed on error

The handling of both JSON_RPC_SERVER_ERROR_MIN_CONTEXT_SLOT_NOT_REACHED and HELIUS_CONTEXT_SLOT_NOT_REACHED error codes is appropriate.


804-827: LGTM: Unsubscribe now maintains LRU cache and pubsub client consistency.

The function correctly removes from the LRU cache only after successful pubsub unsubscribe. If the pubsub unsubscribe fails, the LRU entry is preserved, ensuring the two stay in sync.


1157-1497: LGTM: Test configurations correctly disable metrics.

All tests properly use try_new_with_metrics with enable_subscription_metrics: false to avoid spawning background metric updater tasks during testing.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
magicblock-chainlink/src/remote_account_provider/mod.rs (1)

839-872: Fetch retry/error handling and metrics updates are coherent

The refactored fetch helper now: (1) centralizes failure fan‑out via notify_error, (2) uses a bounded retry loop with MAX_RETRIES, (3) explicitly handles both standard and HELIUS “min context slot not reached” error codes, and (4) updates inc_account_fetches_success/failed/found/not_found consistently. The use of fetching_accounts.remove in both success and failure paths ensures all waiters are always resolved or cleaned up, avoiding hangs. One minor consideration: notify_error logs while holding the fetching_accounts lock; if logs ever become very heavy here, it might be worth logging before/after the lock, but given this is an error path it’s acceptable as-is.

Also applies to: 873-891, 892-977, 979-1022, 1034-1058

♻️ Duplicate comments (2)
magicblock-chainlink/src/remote_account_provider/mod.rs (1)

730-761: LRU + pubsub subscription/unsubscription sequencing is mostly solid; consider restoring evicted key on unsubscribe failure

The new register_subscription flow (LRU add → unsubscribe evicted → send removal → subscribe new key) together with subscribe/unsubscribe now keeps the LRU and pubsub client in sync in the happy path and rolls back the new key when unsubscribe/subscribe fails, which is a big improvement over the previous race. One remaining edge case: when unsubscribe(evicted) fails, you remove the newly-added pubkey from the LRU but leave the previously-evicted key out of the LRU even though its pubsub subscription likely remains. That can leave the LRU’s view and the pubsub client (and metrics) out of sync. Consider, after removing pubkey, re-adding evicted to the LRU to fully restore the prior state before returning the error.

Also applies to: 773-800, 801-829

magicblock-chainlink/src/chainlink/mod.rs (1)

148-199: Account-removal categorization and logging look correct; log label could better reflect feature-owned empties

The new reset_accounts_bank logic correctly: keeps blacklisted and delegated accounts, removes DLP-owned accounts, and counts remaining non-delegated/non-blacklisted accounts into “empty” vs “non-empty” buckets using saturating_sub to avoid underflow. Note that non_empty includes accounts with lamports == 0 when owned by feature::id() (since those are excluded from remaining_empty). If you want the log message to match the actual set precisely, consider tweaking the label for the non_empty bucket to mention that it also includes feature-owned zero‑lamport accounts.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91683be and 1f11b83.

📒 Files selected for processing (2)
  • magicblock-chainlink/src/chainlink/mod.rs (11 hunks)
  • magicblock-chainlink/src/remote_account_provider/mod.rs (28 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

  • magicblock-chainlink/src/chainlink/mod.rs
  • magicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

  • magicblock-chainlink/src/chainlink/mod.rs
  • magicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

  • magicblock-chainlink/src/chainlink/mod.rs
  • magicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-10-26T16:53:29.820Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: magicblock-chainlink/src/remote_account_provider/mod.rs:134-0
Timestamp: 2025-10-26T16:53:29.820Z
Learning: In magicblock-chainlink/src/remote_account_provider/mod.rs, the `Endpoint::separate_pubsub_url_and_api_key()` method uses `split_once("?api-key=")` because the api-key parameter is always the only query parameter right after `?`. No additional query parameter parsing is needed for this use case.

Applied to files:

  • magicblock-chainlink/src/remote_account_provider/mod.rs
🧬 Code graph analysis (2)
magicblock-chainlink/src/chainlink/mod.rs (3)
magicblock-chainlink/src/chainlink/fetch_cloner.rs (1)
  • new (135-159)
magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (1)
  • blacklisted_accounts (6-30)
magicblock-metrics/src/metrics/mod.rs (1)
  • inc_undelegation_requested (475-477)
magicblock-chainlink/src/remote_account_provider/mod.rs (5)
magicblock-metrics/src/metrics/mod.rs (6)
  • inc_account_fetches_failed (463-465)
  • inc_account_fetches_found (467-469)
  • inc_account_fetches_not_found (471-473)
  • inc_account_fetches_success (459-461)
  • set_monitored_accounts_count (421-423)
  • inc_remote_account_provider_a_count (483-485)
magicblock-chainlink/src/remote_account_provider/lru_cache.rs (4)
  • new (26-35)
  • pubkeys (39-42)
  • pubkeys (131-137)
  • len (119-125)
magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs (3)
  • new (46-54)
  • new (311-322)
  • try_new_from_url (165-180)
magicblock-chainlink/src/submux/mod.rs (1)
  • new (147-158)
magicblock-chainlink/src/remote_account_provider/config.rs (2)
  • default (65-72)
  • try_new_with_metrics (27-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run_make_ci_lint
  • GitHub Check: run_make_ci_test
  • GitHub Check: Build Project
🔇 Additional comments (6)
magicblock-chainlink/src/remote_account_provider/mod.rs (2)

410-503: FetchResult channel model and fetch-waiter deduplication look correct

Using FetchResult = Result<RemoteAccount, RemoteAccountProviderError> across both subscription overrides and RPC fetch results is a nice cleanup and ensures errors propagate to callers. The switch to HashMap::entry with Vec<oneshot::Sender<FetchResult>> in try_get_multi correctly appends additional waiters instead of clobbering in-flight entries, and listen_for_account_updates now resolves all pending senders with Ok(remote_account) when a newer subscription update arrives. Error handling on RecvError is also sensible (warn + aggregate into AccountResolutionsFailed). I don’t see correctness issues in this flow.

Also applies to: 625-647, 660-681


1159-1164: Test updates to use try_new_with_metrics(..., false) look appropriate

Switching the tests to construct RemoteAccountProviderConfig via try_new_with_metrics(capacity, LifecycleMode::Ephemeral, false) keeps the LRU capacity validation while explicitly disabling subscription metrics for unit tests. This aligns tests with the new config API without changing their behavior.

Also applies to: 1210-1215, 1287-1293, 1494-1498

magicblock-chainlink/src/chainlink/mod.rs (4)

57-59: Auto-airdrop configuration plumbing is consistent

Adding auto_airdrop_lamports to Chainlink and threading it through try_new and try_new_from_endpoints cleanly exposes the new behavior without impacting existing call sites beyond the extra argument. The field is stored on the struct and only read where needed, so there’s no unused state, and the #[allow(clippy::too_many_arguments)] on try_new_from_endpoints is reasonable given the constructor’s role.

Also applies to: 64-70, 91-101, 134-140


259-301: Fee-payer mark_empty behavior and auto-airdrop flow look reasonable; ensure airdrop helper is idempotent

Treating the fee payer (and its balance PDA when clone_escrow is true) as mark_empty_if_not_found is a good fit for gasless flows and ensures remote fetches don’t fail just because those accounts don’t yet exist. The best-effort auto-airdrop that only triggers when auto_airdrop_lamports > 0, a FetchCloner is available, and the local fee-payer balance is still zero is a sensible safety net. Just make sure FetchCloner::airdrop_account_if_empty itself guards against races (e.g., re-checks remote balance) so repeated calls under load can’t accidentally over-airdrop.


363-377: Enhanced fetch logging with mark_empty_if_not_found context is helpful

Extending fetch_accounts_common trace logging to include the set of mark_empty_if_not_found pubkeys makes it much easier to debug why some accounts are treated as “empty but acceptable.” The guard uses log::Level::Trace and the macro is trace!, so the enabled-level check and log macro are aligned.


400-417: Undelegation instrumentation is aligned with metrics and log levels

Upgrading undelegation_requested to debug! logs and incrementing magicblock_metrics::metrics::inc_undelegation_requested() on each call matches the semantics of that metric and keeps trace noise down. The subsequent subscription via fetch_cloner.subscribe_to_account is unchanged, so behavior is preserved while observability improves.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test-integration/test-cloning/tests/04_escrow_transfer.rs (1)

91-106: Critical: Contradictory assertions on escrow_balance.

Lines 102 and 104 assert contradictory conditions on the same variable:

  • Line 102: escrow_balance must be in range 0.4–0.5 SOL
  • Line 104: escrow_balance must be >= 1.0 SOL

These cannot both be true. Based on the comment at line 104 ("Airdropped 2 SOL - escrowed half"), it appears line 104 should check the payer balance (which received 2 SOL and escrowed ~1 SOL), not the escrow balance. However, the payer balance is currently discarded with _ on line 91.

Apply this diff to capture and check the payer balance instead:

-    let (counter_balance, _, escrow_balance) = log_accounts_balances(
+    let (counter_balance, payer_balance, escrow_balance) = log_accounts_balances(
         &ctx,
         "After transfer from escrow to counter",
         &counter_pda,
         &kp_escrowed.pubkey(),
         &ephemeral_balance_pda,
     );
     let escrow_balance = escrow_balance as f64 / LAMPORTS_PER_SOL as f64;
     let counter_balance = counter_balance as f64 / LAMPORTS_PER_SOL as f64;
+    let payer_balance = payer_balance as f64 / LAMPORTS_PER_SOL as f64;
 
     // Received 1 SOL then transferred 0.5 SOL + tx fee
     assert!((0.4..=0.5).contains(&escrow_balance));
     // Airdropped 2 SOL - escrowed half
-    assert!(escrow_balance >= 1.0);
+    assert!(payer_balance >= 1.0);
     // Received 0.5 SOL
     assert!((0.5..0.6).contains(&counter_balance));

Minor: Inconsistent range operators. Line 102 uses an inclusive end (..=) while line 106 uses an exclusive end (..). Consider using consistent operators unless the difference is intentional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f11b83 and 968e82d.

📒 Files selected for processing (3)
  • test-integration/test-cloning/tests/01_program-deploy.rs (1 hunks)
  • test-integration/test-cloning/tests/04_escrow_transfer.rs (3 hunks)
  • test-integration/test-cloning/tests/05_parallel-cloning.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
📚 Learning: 2025-10-14T09:56:14.047Z
Learnt from: taco-paco
Repo: magicblock-labs/magicblock-validator PR: 564
File: test-integration/programs/flexi-counter/src/processor/call_handler.rs:122-125
Timestamp: 2025-10-14T09:56:14.047Z
Learning: The file test-integration/programs/flexi-counter/src/processor/call_handler.rs contains a test smart contract used for integration testing, not production code.

Applied to files:

  • test-integration/test-cloning/tests/04_escrow_transfer.rs
  • test-integration/test-cloning/tests/05_parallel-cloning.rs
📚 Learning: 2025-11-07T13:09:52.253Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: test-kit/src/lib.rs:275-0
Timestamp: 2025-11-07T13:09:52.253Z
Learning: In test-kit, the transaction scheduler in ExecutionTestEnv is not expected to shut down during tests. Therefore, using `.unwrap()` in test helper methods like `schedule_transaction` is acceptable and will not cause issues in the test environment.

Applied to files:

  • test-integration/test-cloning/tests/05_parallel-cloning.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: run_make_ci_test
🔇 Additional comments (1)
test-integration/test-cloning/tests/05_parallel-cloning.rs (1)

239-239: No actionable changes needed—escrow test disabling is intentional and tracked.

Verification confirms this change is part of a coordinated commit (3b9496a, dated Nov 18) titled "chore: ignore tests that depend on transfers from escrow accounts." This is not a PR side effect but the explicit purpose of this commit, which also modified 04_escrow_transfer.rs. The escrow functionality evaluation is tracked via issue #579 (as noted in the learnings).

The ignore decision is intentional and properly scoped to escrow-dependent tests across multiple test files. While linking the commit or ignore messages to issue #579 would improve traceability, the change itself is appropriately coordinated and documented via commit message.

@thlorenz thlorenz force-pushed the thlorenz/subscription-metrics branch from 2e691d9 to 4d433fb Compare November 18, 2025 13:03
* master:
  chore: dependency update in Cargo.lock file
  fix: set caps for bincode and serde dependencies
  fix: capping solana program version due to transit dependency issue
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (4)
magicblock-processor/src/executor/processing.rs (1)

320-338: Verify fee-payer persistence and address zero-lamport deletion bug.

The persistence filter at line 336 has two significant issues:

  1. Zero-lamport deletions (confirmed bug): The condition account.lamports() == 0 && !privileged causes both insert_account and accounts_tx.send to be skipped. However, the test test_zero_lamports_account() in magicblock-accounts-db/src/tests.rs (lines 388–405) explicitly shows that zero-lamport accounts must be persisted as empty/escrow account markers. Non-privileged transactions that close accounts by setting lamports to 0 will lose this marker state, potentially causing the validator to re-fetch from chain on every access. This must be fixed: either remove the zero-lamport check for all transactions, or document why non-privileged closures should be treated differently.

  2. Notification vs. storage separation: The same condition gates both persistence (insert_account) and observer notification (accounts_tx.send). External consumers relying on accounts_tx to detect account closures won't receive updates for non-privileged zero-lamport accounts. Clarify whether this asymmetry is intentional.

  3. Fee-payer dirty flag (requires verification): In the FeesOnly branch (lines 310–316), confirm that RollbackAccounts::FeePayerOnly from the Solana SVM crate guarantees fee_payer_account.is_dirty() is true when fees are charged. If it's not guaranteed, line 336 will incorrectly skip persisting the fee deduction. Check the Solana SVM documentation or implementation for this guarantee.

magicblock-chainlink/src/remote_account_provider/mod.rs (2)

840-873: Fetch retry/error path and metrics integration are well-structured (with one retriable-error scope caveat)

Positives:

  • notify_error centralizes:

    • logging of fatal RPC issues,
    • incrementing inc_account_fetches_failed(pubkeys.len() as u64),
    • and sending a consistent Err(RemoteAccountProviderError::AccountResolutionsFailed(..)) to all pending waiters for each pubkey.
  • The retry! macro cleanly handles transient conditions with a bounded MAX_RETRIES and backoff, never holding the fetching_accounts lock across .await boundaries.

  • Treating both:

    • JSON_RPC_SERVER_ERROR_MIN_CONTEXT_SLOT_NOT_REACHED, and
    • HELIUS_CONTEXT_SLOT_NOT_REACHED
      as retriable is a good improvement for heterogeneous providers.
  • Success metrics (inc_account_fetches_success, inc_account_fetches_found, inc_account_fetches_not_found) align with the actual RPC outcome and are only emitted once per successful response, while failures are counted via inc_account_fetches_failed in notify_error, so you don’t double-count.

One caveat (unchanged from earlier reviews): the RpcError::ForUser(rpc_user_err) arm unconditionally calls retry! for all ForUser errors. The surrounding comment specifically talks about the “AccountNotFound / Minimum context slot has not been reached” case; retrying on every ForUser (including unrelated user-facing errors) could lead to unnecessary retries. If you want to tighten this without adding complex parsing, you could at least log the actual rpc_user_err at warn/error level before deciding whether it looks like a min-context-slot transient.

Functionally this block is correct; the above is mainly about avoiding pointless retries on clearly non-transient ForUser errors.

Also applies to: 875-893, 898-899, 919-977, 984-1024, 1036-1061


49-56: Active-subscription metrics wiring is solid; minor logging guard nit

The background updater correctly:

  • samples lrucache_subscribed_accounts.len() vs pubsub_client.subscription_count(Some(&never_evicted)),
  • computes diffs between the LRU and pubsub view, gated behind a debug-level check,
  • and updates set_monitored_accounts_count(pubsub_total) at a reasonable 60s interval.

This is a good fit for reconciling client vs. cache state without adding per-request overhead.

One small nit: inside start_active_subscriptions_updater you guard expensive work with log::log_enabled!(log::Level::Debug) but still emit trace! logs (All pubsub subscriptions: ...) under that guard. When Debug is enabled but Trace is not, you still pay the cost of collecting all_pubsub_subs and formatting the message that will never be logged at trace level. Consider either:

  • switching the guard to Level::Trace where you emit trace!, or
  • changing those particular trace! calls to debug! so the guard and macro level match.

This is purely a perf/readability nit; behavior is otherwise correct.

Also applies to: 61-67, 203-273

magicblock-chainlink/src/remote_account_provider/lru_cache.rs (1)

267-275: test_never_evicted_accounts validates clock sysvar is protected

The test correctly asserts that the clock sysvar is present in never_evicted_accounts(), ensuring we won’t accidentally evict it via the LRU cache. This gives good coverage for the never‑evict set.

You may still want a small unit test for len() and pubkeys() (initial length, growth as keys are added, and verifying returned pubkeys) as previously suggested; not strictly required but would round out the API coverage.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 968e82d and 451effe.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • magicblock-chainlink/src/remote_account_provider/lru_cache.rs (4 hunks)
  • magicblock-chainlink/src/remote_account_provider/mod.rs (27 hunks)
  • magicblock-chainlink/src/testing/mod.rs (1 hunks)
  • magicblock-processor/Cargo.toml (1 hunks)
  • magicblock-processor/src/executor/processing.rs (6 hunks)
  • test-integration/test-chainlink/tests/ix_06_redeleg_us_separate_slots.rs (4 hunks)
  • test-integration/test-chainlink/tests/ix_07_redeleg_us_same_slot.rs (3 hunks)
  • test-integration/test-cloning/tests/01_program-deploy.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

  • magicblock-chainlink/src/remote_account_provider/lru_cache.rs
  • magicblock-processor/src/executor/processing.rs
  • magicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

  • magicblock-chainlink/src/remote_account_provider/lru_cache.rs
  • test-integration/test-chainlink/tests/ix_07_redeleg_us_same_slot.rs
  • test-integration/test-chainlink/tests/ix_06_redeleg_us_separate_slots.rs
  • magicblock-processor/src/executor/processing.rs
  • magicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-11-18T08:47:39.681Z
Learnt from: Dodecahedr0x
Repo: magicblock-labs/magicblock-validator PR: 639
File: magicblock-chainlink/tests/04_redeleg_other_separate_slots.rs:158-165
Timestamp: 2025-11-18T08:47:39.681Z
Learning: In magicblock-chainlink tests involving compressed accounts, `set_remote_slot()` sets the slot of the `AccountSharedData`, while `compressed_account_shared_with_owner_and_slot()` sets the slot of the delegation record. These are two different fields and both calls are necessary.

Applied to files:

  • magicblock-chainlink/src/remote_account_provider/lru_cache.rs
  • test-integration/test-chainlink/tests/ix_07_redeleg_us_same_slot.rs
  • magicblock-chainlink/src/testing/mod.rs
  • test-integration/test-chainlink/tests/ix_06_redeleg_us_separate_slots.rs
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

  • magicblock-chainlink/src/remote_account_provider/lru_cache.rs
  • test-integration/test-chainlink/tests/ix_07_redeleg_us_same_slot.rs
  • test-integration/test-chainlink/tests/ix_06_redeleg_us_separate_slots.rs
  • magicblock-processor/src/executor/processing.rs
  • magicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-11-13T09:38:43.804Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/locks.rs:64-102
Timestamp: 2025-11-13T09:38:43.804Z
Learning: In magicblock-processor's TransactionScheduler (scheduler/mod.rs line 59), the executor count is clamped to MAX_SVM_EXECUTORS (63) at initialization time, and executor IDs are assigned sequentially from 0 to count-1. This architectural guarantee ensures that executor IDs used in the bitmask-based AccountLock (scheduler/locks.rs) will always be within valid bounds for bit shifting operations, making runtime bounds checks unnecessary.

Applied to files:

  • magicblock-processor/src/executor/processing.rs
📚 Learning: 2025-10-28T13:15:42.706Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 596
File: magicblock-processor/src/scheduler.rs:1-1
Timestamp: 2025-10-28T13:15:42.706Z
Learning: In magicblock-processor, transaction indexes were always set to 0 even before the changes in PR #596. The proper transaction indexing within slots will be addressed during the planned ledger rewrite.

Applied to files:

  • magicblock-processor/src/executor/processing.rs
📚 Learning: 2025-10-21T10:34:59.140Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-accounts-db/src/lib.rs:63-72
Timestamp: 2025-10-21T10:34:59.140Z
Learning: In magicblock-validator, the AccountsDb "stop-the-world" synchronizer is managed at the processor/executor level, not at the AccountsDb API level. Transaction executors in magicblock-processor hold a read lock (sync.read()) for the duration of each slot and release it only at slot boundaries, ensuring all account writes happen under the read lock. Snapshot operations acquire a write lock, blocking until all executors release their read locks. This pattern ensures mutual exclusion between writes and snapshots without requiring read guards in AccountsDb write APIs.

Applied to files:

  • magicblock-processor/src/executor/processing.rs
📚 Learning: 2025-10-26T16:53:29.820Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: magicblock-chainlink/src/remote_account_provider/mod.rs:134-0
Timestamp: 2025-10-26T16:53:29.820Z
Learning: In magicblock-chainlink/src/remote_account_provider/mod.rs, the `Endpoint::separate_pubsub_url_and_api_key()` method uses `split_once("?api-key=")` because the api-key parameter is always the only query parameter right after `?`. No additional query parameter parsing is needed for this use case.

Applied to files:

  • magicblock-chainlink/src/remote_account_provider/mod.rs
🧬 Code graph analysis (2)
magicblock-chainlink/src/remote_account_provider/lru_cache.rs (1)
magicblock-metrics/src/metrics/mod.rs (1)
  • inc_evicted_accounts_count (424-426)
test-integration/test-chainlink/tests/ix_07_redeleg_us_same_slot.rs (1)
magicblock-validator/src/main.rs (1)
  • init_logger (13-50)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build Project
  • GitHub Check: run_make_ci_test
🔇 Additional comments (10)
magicblock-processor/Cargo.toml (1)

44-44: LGTM!

The solana-keypair dev-dependency is properly formatted with consistent spacing ({ workspace = true }) matching the rest of the file. The addition supports testing in the processor crate, and workspace version management is appropriately applied.

test-integration/test-cloning/tests/01_program-deploy.rs (1)

186-186: LGTM! Bump calculation fix properly applied.

The fix from the previous review has been correctly implemented. The calculation now produces incrementing bump values (1, 2, 3, ..., 51) as retries progress, ensuring unique messages ("Hola Mundo 1", "Hola Mundo 2", etc.) for each retry attempt.

test-integration/test-chainlink/tests/ix_07_redeleg_us_same_slot.rs (1)

8-9: LGTM: Import additions support new test functionality.

The imports correctly add assert_not_subscribed for subscription assertions and sleep_ms for timing delays used below.

magicblock-processor/src/executor/processing.rs (1)

353-367: LGTM! Clean helper for extracting rollback lamports.

The rollback_feepayer_lamports helper correctly handles all three RollbackAccounts variants and cleanly extracts the feepayer's rollback lamports for comparison in the gasless mutation guard.

magicblock-chainlink/src/remote_account_provider/lru_cache.rs (1)

5-5: Eviction metric placement looks correct

inc_evicted_accounts_count() is only called when LruCache::push actually evicts an entry (after we’ve ruled out the never‑evict set and promotion case), so the counter should reflect real evictions and not mere accesses or never‑evict keys. This matches the intended semantics for an eviction metric.

Also applies to: 82-89

magicblock-chainlink/src/remote_account_provider/mod.rs (5)

63-67: Fetch waiters and Result-based channels now behave correctly under concurrency

The switch to:

  • type FetchResult = Result<RemoteAccount, RemoteAccountProviderError>;
  • FetchingAccounts = Mutex<HashMap<Pubkey, (u64, Vec<oneshot::Sender<FetchResult>>)>>;
  • and the Entry-based insertion in try_get_multi,

fixes the earlier issue where a second caller would overwrite the first caller’s waiter for the same account. Using match fetching.entry(pubkey) to append new senders ensures all in-flight callers for a given Pubkey share the same fetch result.

Coupled with:

  • listen_for_account_updates now sending Ok(remote_account.clone()) into all pending senders when a fresh subscription update wins the race, and
  • fetch() sending either Ok(remote_account.clone()) or Err(...) to all pending senders,

every waiter now receives exactly one FetchResult (success, not-found, or error). The try_get_multi loop cleanly distinguishes:

  • Ok(Ok(remote_account)) → happy path,
  • Ok(Err(err)) → logical fetch failure (e.g., RPC problems),
  • Err(recv_err) → channel-level failure, wrapped as RemoteAccountProviderError::RecvrError.

This looks correct and should eliminate dropped waiters under concurrent load.

Also applies to: 410-503, 633-647, 660-682


286-303: LRU cache construction and metrics gating in new/try_new_from_urls look correct

  • AccountsLruCache is now constructed from config.subscribed_accounts_lru_capacity() using NonZeroUsize::new(cap).expect("non-zero capacity"), relying on the config constructor to reject zero (consistent with RemoteAccountProviderConfig::try_new_with_metrics).

  • The background active-subscriptions updater is only started when config.enable_subscription_metrics() is true, and the resulting JoinHandle is stored in _active_subscriptions_task_handle so it lives for the provider’s lifetime. Tests pass false to avoid spawning this task.

  • try_new_from_urls wires up (Arc<ChainPubsubClientImpl>, mpsc::Receiver<()>) pairs and passes them into SubMuxClient::new, which is then used to construct the provider with the same config, so the metrics flag affects both URL-based and client-based constructors uniformly.

  • promote_accounts now delegates to lrucache_subscribed_accounts.promote_multi, which is consistent with the new LRU wrapper API.

All of this is coherent with the new LRU + metrics design.

Also applies to: 311-316, 335-380, 382-384


861-873: Fetch metrics and high-level eviction tests provide good end-to-end coverage

  • The fetch path now records:

    • failures via inc_account_fetches_failed when notify_error is called,
    • successes via inc_account_fetches_success(pubkeys.len() as u64),
    • and splits found/not-found counts with inc_account_fetches_found and inc_account_fetches_not_found.

    This gives a clear metric picture of how RPC fetches behave without changing the functional contract.

  • The new high-level tests:

    • test_add_accounts_up_to_limit_no_eviction,
    • test_eviction_order,
    • and test_multiple_evictions_in_sequence

    exercise end-to-end behavior of the LRU capacity, eviction order, and removal notifications via removed_account_rx, not just the raw cache wrapper. They mirror the lower-level tests in lru_cache.rs and validate that the RemoteAccountProvider’s subscription machinery and eviction signals behave as expected.

Using RemoteAccountProviderConfig::try_new_with_metrics(..., false) in these tests neatly disables the metrics updater while still using the same config path as production.

Also applies to: 984-1016, 1520-1617


1159-1174: Tests switched to try_new_with_metrics keep behavior while making metrics explicit

All the unit tests that construct a RemoteAccountProvider now use:

RemoteAccountProviderConfig::try_new_with_metrics(
    1000, // or accounts_capacity
    LifecycleMode::Ephemeral,
    false, // disable subscription metrics
)?

and pass &config into RemoteAccountProvider::new. This:

  • keeps the previous functional behavior (no background metrics updater during tests),
  • exercises the same constructor path that production uses for metrics-enabled setups,
  • and makes the LRU capacity explicit in the tests that depend on eviction semantics.

This is a clean way to adapt the tests to the new configuration surface.

Also applies to: 1209-1226, 1289-1303, 1491-1501


731-756: Subscription/LRU atomicity improved, but eviction-unsubscribe failures can desync LRU vs pubsub

The changes here make an important improvement: register_subscription now calls self.pubsub_client.subscribe(*pubkey).await? before mutating the LRU, so we no longer mark an account as “watched” in the LRU if the upstream subscription fails.

However, in the eviction branch:

if let Some(evicted) = self.lrucache_subscribed_accounts.add(*pubkey) {
    trace!("Evicting {pubkey}");

    // 1. Unsubscribe from the account directly (LRU has already removed it)
    if let Err(err) = self.pubsub_client.unsubscribe(evicted).await {
        warn!("Failed to unsubscribe from pubsub for evicted account {evicted}: {err:?}");
    }

    // 2. Inform upstream so it can remove it from the store
    self.send_removal_update(evicted).await?;
}

if unsubscribe(evicted) fails, we still:

  • keep pubkey in the LRU,
  • and send a removal update for evicted,

even though the pubsub layer may still be subscribed to evicted. That can permanently desync:

  • the LRU (and upstream store + metrics) which think evicted is gone, from
  • the pubsub client, which may continue streaming updates and contributing to subscription_count.

Given you now have an active-subscriptions reconciler that expects LRU and pubsub to converge, this inconsistency will show up as persistent mismatches with no self-healing.

Consider treating a failed eviction-unsubscribe as an atomicity violation and rolling back:

  • remove pubkey from the LRU again,
  • optionally attempt unsubscribe(pubkey) to revert the new subscription,
  • and return an error instead of sending a removal update for evicted.

For example:

         if let Some(evicted) = self.lrucache_subscribed_accounts.add(*pubkey) {
             trace!("Evicting {pubkey}");

-            // 1. Unsubscribe from the account directly (LRU has already removed it)
-            if let Err(err) = self.pubsub_client.unsubscribe(evicted).await {
-                // Should we retry here?
-                warn!(
-                    "Failed to unsubscribe from pubsub for evicted account {evicted}: {err:?}");
-            }
-
-            // 2. Inform upstream so it can remove it from the store
-            self.send_removal_update(evicted).await?;
+            // 1. Unsubscribe from the account directly (LRU has already removed it)
+            if let Err(err) = self.pubsub_client.unsubscribe(evicted).await {
+                warn!(
+                    "Failed to unsubscribe from pubsub for evicted account {evicted}: {err:?}. \
+                     Rolling back eviction of {evicted} and subscription for {pubkey}."
+                );
+                // Roll back: drop the newly inserted key from LRU so our view matches pubsub_client
+                self.lrucache_subscribed_accounts.remove(pubkey);
+                // Best-effort revert for the newly subscribed account; ignore failures here
+                let _ = self.pubsub_client.unsubscribe(*pubkey).await;
+                return Err(err);
+            }
+
+            // 2. Inform upstream so it can remove it from the store
+            self.send_removal_update(evicted).await?;
         }

That keeps the LRU, upstream store, and pubsub client aligned even in the rare case of an unsubscribe failure during eviction.

The subscribe/unsubscribe public methods themselves look good: subscribe promotes existing entries in the LRU, and unsubscribe:

  • refuses to touch never‑evict accounts, and
  • only removes from the LRU + sends a removal update after a successful pubsub unsubscribe, leaving state unchanged on failure.

Also applies to: 771-794, 797-832

Comment on lines +119 to +141
pub fn len(&self) -> usize {
let subs = self
.subscribed_accounts
.lock()
.expect("subscribed_accounts lock poisoned");
subs.len()
}

pub fn never_evicted_accounts(&self) -> Vec<Pubkey> {
self.accounts_to_never_evict.iter().cloned().collect()
}

pub fn can_evict(&self, pubkey: &Pubkey) -> bool {
!self.accounts_to_never_evict.contains(pubkey)
}

pub fn pubkeys(&self) -> Vec<Pubkey> {
let subs = self
.subscribed_accounts
.lock()
.expect("subscribed_accounts lock poisoned");
subs.iter().map(|(k, _)| *k).collect()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

New LRU cache accessors are simple and thread‑safe

len(), never_evicted_accounts(), can_evict(), and pubkeys() are straightforward wrappers over the internal Mutex<LruCache<Pubkey, ()>> / HashSet<Pubkey>. The locking and cloning behavior is correct, and given that pubkeys() and never_evicted_accounts() are primarily used in metrics/debug paths, the extra allocations are acceptable.

If these methods ever end up in a hot path, consider returning iterators or borrowing views instead of materializing Vec<Pubkey> on each call.

🤖 Prompt for AI Agents
magicblock-chainlink/src/remote_account_provider/lru_cache.rs around lines
119-141: The new accessors (len, never_evicted_accounts, can_evict, pubkeys) are
thread-safe and correct as written, so no code changes are required; leave them
as-is, but if these methods later appear in a hot path replace Vec allocations
with iterator/borrowed views (or return impl Iterator / slices) to avoid
repeated cloning.

Comment on lines +145 to +235
#[macro_export]
macro_rules! assert_cloned_as_delegated_with_retries {
($cloner:expr, $pubkeys:expr, $retries:expr) => {{
for pubkey in $pubkeys {
let mut account_opt = None;
for _ in 0..$retries {
account_opt = $cloner.get_account(pubkey);
if let Some(account) = &account_opt {
if account.delegated() {
break;
}
}
::std::thread::sleep(::std::time::Duration::from_millis(100));
}
let account = account_opt
.expect(&format!("Expected account {} to be cloned", pubkey));
assert!(
account.delegated(),
"Expected account {} to be delegated",
pubkey
);
}
}};
($cloner:expr, $pubkeys:expr, $slot:expr, $retries:expr) => {{
for pubkey in $pubkeys {
let mut account_opt = None;
for _ in 0..$retries {
account_opt = $cloner.get_account(pubkey);
if let Some(account) = &account_opt {
if account.delegated() && account.remote_slot() == $slot {
break;
}
}
::std::thread::sleep(::std::time::Duration::from_millis(100));
}
let account = account_opt
.expect(&format!("Expected account {} to be cloned", pubkey));
assert!(
account.delegated(),
"Expected account {} to be delegated",
pubkey
);
assert_eq!(
account.remote_slot(),
$slot,
"Expected account {} to have remote slot {}",
pubkey,
$slot
);
}
}};
($cloner:expr, $pubkeys:expr, $slot:expr, $owner:expr, $retries:expr) => {{
use solana_account::ReadableAccount;
for pubkey in $pubkeys {
let mut account_opt = None;
for _ in 0..$retries {
account_opt = $cloner.get_account(pubkey);
if let Some(account) = &account_opt {
if account.delegated()
&& account.remote_slot() == $slot
&& account.owner() == &$owner
{
break;
}
}
::std::thread::sleep(::std::time::Duration::from_millis(100));
}
let account = account_opt
.expect(&format!("Expected account {} to be cloned", pubkey));
assert!(
account.delegated(),
"Expected account {} to be delegated",
pubkey
);
assert_eq!(
account.remote_slot(),
$slot,
"Expected account {} to have remote slot {}",
pubkey,
$slot
);
assert_eq!(
account.owner(),
&$owner,
"Expected account {} to have owner {}",
pubkey,
$owner
);
}
}};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Capture $slot/$owner once to avoid degenerate self-comparisons and repeated evaluation

The retry helpers are a good fit for eventual-consistency tests, but the way $slot and $owner are used can subtly weaken the assertions:

  • At call sites you often pass expressions like account.remote_slot() and program_flexi_counter::id().
  • Inside the macro, you also bind a local account (e.g., if let Some(account) = &account_opt and later let account = account_opt.expect(...)) and then use $slot and $owner in conditions.
  • After expansion, account from the macro body shadows the caller’s account, so expressions like account.remote_slot() == $slot can collapse into account.remote_slot() == account.remote_slot(), and the final assert_eq!(account.remote_slot(), $slot, ...) likewise becomes a tautology when $slot references account from the invocation context.
  • This means that when callers pass something like account.remote_slot(), the “slot must equal X” part of the assertion no longer really constrains anything; you’re effectively only checking delegated().

To ensure the macros actually assert “matches the slot/owner as seen at the call site” and to avoid repeated evaluation of slot/owner expressions, capture them once at the top of each overload and use local expected_* variables inside the loops and assertions:

 #[macro_export]
 macro_rules! assert_cloned_as_delegated_with_retries {
@@
     ($cloner:expr, $pubkeys:expr, $slot:expr, $retries:expr) => {{
-        for pubkey in $pubkeys {
+        let expected_slot = $slot;
+        for pubkey in $pubkeys {
             let mut account_opt = None;
             for _ in 0..$retries {
                 account_opt = $cloner.get_account(pubkey);
                 if let Some(account) = &account_opt {
-                    if account.delegated() && account.remote_slot() == $slot {
+                    if account.delegated() && account.remote_slot() == expected_slot {
                         break;
                     }
                 }
                 ::std::thread::sleep(::std::time::Duration::from_millis(100));
             }
             let account = account_opt
                 .expect(&format!("Expected account {} to be cloned", pubkey));
             assert!(
                 account.delegated(),
                 "Expected account {} to be delegated",
                 pubkey
             );
             assert_eq!(
                 account.remote_slot(),
-                $slot,
+                expected_slot,
                 "Expected account {} to have remote slot {}",
                 pubkey,
-                $slot
+                expected_slot
             );
         }
     }};
     ($cloner:expr, $pubkeys:expr, $slot:expr, $owner:expr, $retries:expr) => {{
         use solana_account::ReadableAccount;
-        for pubkey in $pubkeys {
+        let expected_slot = $slot;
+        let expected_owner = $owner;
+        for pubkey in $pubkeys {
             let mut account_opt = None;
             for _ in 0..$retries {
                 account_opt = $cloner.get_account(pubkey);
                 if let Some(account) = &account_opt {
                     if account.delegated()
-                        && account.remote_slot() == $slot
-                        && account.owner() == &$owner
+                        && account.remote_slot() == expected_slot
+                        && account.owner() == &expected_owner
                     {
                         break;
                     }
                 }
                 ::std::thread::sleep(::std::time::Duration::from_millis(100));
             }
             let account = account_opt
                 .expect(&format!("Expected account {} to be cloned", pubkey));
             assert!(
                 account.delegated(),
                 "Expected account {} to be delegated",
                 pubkey
             );
             assert_eq!(
                 account.remote_slot(),
-                $slot,
+                expected_slot,
                 "Expected account {} to have remote slot {}",
                 pubkey,
-                $slot
+                expected_slot
             );
             assert_eq!(
                 account.owner(),
-                &$owner,
+                &expected_owner,
                 "Expected account {} to have owner {}",
                 pubkey,
-                $owner
+                expected_owner
             );
         }
     }};
 }

You may also want to apply the same “capture once” pattern to the non-retrying assert_cloned_as_delegated! / assert_cloned_as_undelegated! macros for consistency, but the above change is the minimum to make the new retry helpers’ slot/owner checks meaningful and robust.

@@ -1,4 +1,4 @@
use log::error;
use log::*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Replace wildcard import with specific log macros.

The wildcard import use log::*; is non-idiomatic and can cause namespace pollution. Prefer explicit imports of the specific macros you need (e.g., error, warn, info, debug, trace).

Apply this diff:

-use log::*;
+use log::{error, warn};

Adjust the list to include any additional log macros actually used in this file.

🤖 Prompt for AI Agents
In magicblock-processor/src/executor/processing.rs around line 1, replace the
wildcard import `use log::*;` with explicit imports of only the logging macros
this file uses (for example: `error`, `warn`, `info`, `debug`, `trace`); edit
the use statement to list those macros explicitly (and add/remove macros to
match actual usages in the file) to avoid namespace pollution and follow
idiomatic Rust.

Comment on lines +156 to 192
let mut result = output.processing_results.pop().expect(
"single transaction result is always present in the output",
);

let undelegated_feepayer_was_modified = result
.as_ref()
.ok()
.and_then(|r| r.executed_transaction())
.and_then(|txn| {
let first_acc = txn.loaded_transaction.accounts.first();
let rollback_lamports = rollback_feepayer_lamports(
&txn.loaded_transaction.rollback_accounts,
);
first_acc.map(|acc| (acc, rollback_lamports))
})
.map(|(acc, rollback_lamports)| {
// The check logic: if we have an undelegated feepayer, then
// it cannot have been mutated. The only exception is the
// privileged feepayer (internal validator operations), for
// which we do allow the mutations, since it can be used to
// fund other accounts.
(acc.1.is_dirty()
&& (acc.1.lamports() != 0 || rollback_lamports != 0))
&& !acc.1.delegated()
&& !acc.1.privileged()
})
.unwrap_or_default();
let gasless = self.environment.fee_lamports_per_signature == 0;
// If we are running in the gasless mode, we should not allow
// any mutation of the feepayer account, since that would make
// it possible for malicious actors to peform transfer operations
// from undelegated feepayers to delegated accounts, which would
// result in validator loosing funds upon balance settling.
if gasless && undelegated_feepayer_was_modified {
result = Err(TransactionError::InvalidAccountForFee);
};
(result, output.balances)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Move gasless check before computing feepayer mutation to avoid hot-path overhead.

The undelegated_feepayer_was_modified logic (lines 160-182) runs for every transaction, even when gasless is false. Since this guard only matters in gasless mode, checking gasless first and short-circuiting will avoid unnecessary work on the hot path.

Additionally, fix the typos in the comment:

  • Line 186: "peform" → "perform"
  • Line 188: "loosing" → "losing"

Apply this diff:

         let mut result = output.processing_results.pop().expect(
             "single transaction result is always present in the output",
         );
 
+        let gasless = self.environment.fee_lamports_per_signature == 0;
         let undelegated_feepayer_was_modified = result
             .as_ref()
             .ok()
             .and_then(|r| r.executed_transaction())
             .and_then(|txn| {
                 let first_acc = txn.loaded_transaction.accounts.first();
                 let rollback_lamports = rollback_feepayer_lamports(
                     &txn.loaded_transaction.rollback_accounts,
                 );
                 first_acc.map(|acc| (acc, rollback_lamports))
             })
             .map(|(acc, rollback_lamports)| {
                 // The check logic: if we have an undelegated feepayer, then
                 // it cannot have been mutated. The only exception is the
                 // privileged feepayer (internal validator operations), for
                 // which we do allow the mutations, since it can be used to
                 // fund other accounts.
                 (acc.1.is_dirty()
                     && (acc.1.lamports() != 0 || rollback_lamports != 0))
                     && !acc.1.delegated()
                     && !acc.1.privileged()
             })
             .unwrap_or_default();
-        let gasless = self.environment.fee_lamports_per_signature == 0;
         // If we are running in the gasless mode, we should not allow
         // any mutation of the feepayer account, since that would make
-        // it possible for malicious actors to peform transfer operations
+        // it possible for malicious actors to perform transfer operations
         // from undelegated feepayers to delegated accounts, which would
-        // result in validator loosing funds upon balance settling.
+        // result in validator losing funds upon balance settling.
         if gasless && undelegated_feepayer_was_modified {
             result = Err(TransactionError::InvalidAccountForFee);
         };

Better still, wrap the entire computation in an if gasless { ... } block:

         let mut result = output.processing_results.pop().expect(
             "single transaction result is always present in the output",
         );
 
-        let undelegated_feepayer_was_modified = result
-            .as_ref()
-            .ok()
-            .and_then(|r| r.executed_transaction())
-            .and_then(|txn| {
-                let first_acc = txn.loaded_transaction.accounts.first();
-                let rollback_lamports = rollback_feepayer_lamports(
-                    &txn.loaded_transaction.rollback_accounts,
-                );
-                first_acc.map(|acc| (acc, rollback_lamports))
-            })
-            .map(|(acc, rollback_lamports)| {
-                // The check logic: if we have an undelegated feepayer, then
-                // it cannot have been mutated. The only exception is the
-                // privileged feepayer (internal validator operations), for
-                // which we do allow the mutations, since it can be used to
-                // fund other accounts.
-                (acc.1.is_dirty()
-                    && (acc.1.lamports() != 0 || rollback_lamports != 0))
-                    && !acc.1.delegated()
-                    && !acc.1.privileged()
-            })
-            .unwrap_or_default();
         let gasless = self.environment.fee_lamports_per_signature == 0;
         // If we are running in the gasless mode, we should not allow
         // any mutation of the feepayer account, since that would make
-        // it possible for malicious actors to peform transfer operations
+        // it possible for malicious actors to perform transfer operations
         // from undelegated feepayers to delegated accounts, which would
-        // result in validator loosing funds upon balance settling.
-        if gasless && undelegated_feepayer_was_modified {
-            result = Err(TransactionError::InvalidAccountForFee);
-        };
+        // result in validator losing funds upon balance settling.
+        if gasless {
+            let undelegated_feepayer_was_modified = result
+                .as_ref()
+                .ok()
+                .and_then(|r| r.executed_transaction())
+                .and_then(|txn| {
+                    let first_acc = txn.loaded_transaction.accounts.first();
+                    let rollback_lamports = rollback_feepayer_lamports(
+                        &txn.loaded_transaction.rollback_accounts,
+                    );
+                    first_acc.map(|acc| (acc, rollback_lamports))
+                })
+                .map(|(acc, rollback_lamports)| {
+                    (acc.1.is_dirty()
+                        && (acc.1.lamports() != 0 || rollback_lamports != 0))
+                        && !acc.1.delegated()
+                        && !acc.1.privileged()
+                })
+                .unwrap_or(false);
+
+            if undelegated_feepayer_was_modified {
+                result = Err(TransactionError::InvalidAccountForFee);
+            }
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let mut result = output.processing_results.pop().expect(
"single transaction result is always present in the output",
);
let undelegated_feepayer_was_modified = result
.as_ref()
.ok()
.and_then(|r| r.executed_transaction())
.and_then(|txn| {
let first_acc = txn.loaded_transaction.accounts.first();
let rollback_lamports = rollback_feepayer_lamports(
&txn.loaded_transaction.rollback_accounts,
);
first_acc.map(|acc| (acc, rollback_lamports))
})
.map(|(acc, rollback_lamports)| {
// The check logic: if we have an undelegated feepayer, then
// it cannot have been mutated. The only exception is the
// privileged feepayer (internal validator operations), for
// which we do allow the mutations, since it can be used to
// fund other accounts.
(acc.1.is_dirty()
&& (acc.1.lamports() != 0 || rollback_lamports != 0))
&& !acc.1.delegated()
&& !acc.1.privileged()
})
.unwrap_or_default();
let gasless = self.environment.fee_lamports_per_signature == 0;
// If we are running in the gasless mode, we should not allow
// any mutation of the feepayer account, since that would make
// it possible for malicious actors to peform transfer operations
// from undelegated feepayers to delegated accounts, which would
// result in validator loosing funds upon balance settling.
if gasless && undelegated_feepayer_was_modified {
result = Err(TransactionError::InvalidAccountForFee);
};
(result, output.balances)
let mut result = output.processing_results.pop().expect(
"single transaction result is always present in the output",
);
let gasless = self.environment.fee_lamports_per_signature == 0;
// If we are running in the gasless mode, we should not allow
// any mutation of the feepayer account, since that would make
// it possible for malicious actors to perform transfer operations
// from undelegated feepayers to delegated accounts, which would
// result in validator losing funds upon balance settling.
if gasless {
let undelegated_feepayer_was_modified = result
.as_ref()
.ok()
.and_then(|r| r.executed_transaction())
.and_then(|txn| {
let first_acc = txn.loaded_transaction.accounts.first();
let rollback_lamports = rollback_feepayer_lamports(
&txn.loaded_transaction.rollback_accounts,
);
first_acc.map(|acc| (acc, rollback_lamports))
})
.map(|(acc, rollback_lamports)| {
(acc.1.is_dirty()
&& (acc.1.lamports() != 0 || rollback_lamports != 0))
&& !acc.1.delegated()
&& !acc.1.privileged()
})
.unwrap_or(false);
if undelegated_feepayer_was_modified {
result = Err(TransactionError::InvalidAccountForFee);
}
}
(result, output.balances)

Comment on lines +8 to 11
assert_cloned_as_delegated_with_retries, assert_cloned_as_undelegated,
assert_not_subscribed, assert_subscribed_without_delegation_record,
testing::init_logger,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Test updates correctly use the retry macro, but depend on accurate slot/owner checks in the helper

  • Switching to assert_cloned_as_delegated_with_retries!(..., account.remote_slot(), program_flexi_counter::id(), 30) in both delegated phases plus the added sleep_ms(1_500).await delays is a good way to harden this flow against eventual-consistency/pubsub lag.
  • With the retry interval fixed at 100ms, your worst-case wait per phase is ~4.5s (1.5s initial sleep + 30 * 100ms), which is acceptable for this integration test but worth keeping in mind if you extend this pattern to many tests.

One caveat: the correctness of the slot/owner checks here hinges on the helper macro capturing the slot/owner expressions as they are at the call site. As written, the macro can accidentally compare values against themselves when $slot / $owner expressions reference account, so please apply the expected_slot / expected_owner capture refactor in assert_cloned_as_delegated_with_retries! (see previous comment) to ensure these assertions genuinely validate the remote slot and owner rather than only delegated().

Also applies to: 40-46, 62-62, 80-80, 84-90

🤖 Prompt for AI Agents
In test-integration/test-chainlink/tests/ix_06_redeleg_us_separate_slots.rs
around lines 8-11 (and also affecting lines 40-46, 62, 80, 84-90), the test
switches to using assert_cloned_as_delegated_with_retries! with
account.remote_slot()/account.owner() but the macro currently evaluates those
expressions in place causing it to potentially compare values against
themselves; refactor the macro to capture the caller expressions into local
bindings like expected_slot and expected_owner at the start of the macro
(evaluate and bind $slot and $owner once), then use those bindings in all
subsequent comparisons and retries so the assertions validate the remote
slot/owner against the originally passed expected values; ensure the macro still
uses the retry loop and sleep delays as before and update all call sites listed
to rely on the new captured names.

use solana_sdk::{signature::Keypair, signer::Signer};
use test_chainlink::ixtest_context::IxtestContext;

#[ignore = "Started failing when fixing excessive subs, last time passing ded9c50a"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Test disabled rather than fixed after subscription management changes.

Ignoring this test means the "redelegate to us in same slot" flow is no longer validated. The failure reason explicitly ties to the subscription fixes in this PR, suggesting the new subscription management may have introduced a regression in this flow.

Action required:

  1. Investigate why the subscription changes broke this test
  2. Either fix the test expectations to match new behavior, or
  3. Fix the subscription management code to handle this flow correctly

Disabled tests provide zero value and can hide production bugs.

info!("1. Account delegated to us");

ctx.chainlink.ensure_accounts(&pubkeys, None).await.unwrap();
sleep_ms(500).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Major: Hard-coded sleep introduces fragile timing dependency.

The 500ms sleep calls are a brittle workaround for asynchronous pubsub updates. This approach has several problems:

  1. Flakiness risk: 500ms may be insufficient on slower CI systems or under load
  2. Inefficiency: May be excessive on fast systems, unnecessarily slowing test suite
  3. Logical inconsistency: The test asserts accounts delegated to "us" should NOT be subscribed (assert_not_subscribed!), yet adds waits for "pubsub subscription handler". If these accounts aren't subscribed, why wait for subscription events?

Recommended approach:

Instead of arbitrary sleeps, use event-driven synchronization:

  • Add a method to poll/wait for subscription state to stabilize
  • Use condition variables or channels to signal when pubsub updates complete
  • Consider exposing a "wait_for_subscriptions_settled()" test helper

Example pattern:

// Instead of:
ctx.chainlink.ensure_accounts(&pubkeys, None).await.unwrap();
sleep_ms(500).await;

// Use:
ctx.chainlink.ensure_accounts(&pubkeys, None).await.unwrap();
ctx.chainlink.wait_for_subscription_state_stable().await;

This would provide deterministic, reliable tests without arbitrary timing.

Also applies to: 63-65

🤖 Prompt for AI Agents
In test-integration/test-chainlink/tests/ix_07_redeleg_us_same_slot.rs around
lines 37 and 63-65, the test uses hard-coded sleep_ms(500).await which creates
fragile timing dependencies; replace these sleeps with an event-driven wait: add
or use a helper like
wait_for_subscription_state_stable()/wait_for_pubsub_updates() on ctx.chainlink
that polls the subscription state or awaits a channel/condition variable
signaled by the pubsub handler, call that helper immediately after
ensure_accounts(...) and any operations that trigger pubsub updates, and remove
the fixed sleep calls so the test deterministically waits until subscriptions
have settled before asserting with assert_not_subscribed!.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants