Skip to content

Small cleanup followups#47

Merged
anderdc merged 23 commits intotestfrom
refactor/post-v1-cleanup-followups
Apr 16, 2026
Merged

Small cleanup followups#47
anderdc merged 23 commits intotestfrom
refactor/post-v1-cleanup-followups

Conversation

@LandynDev
Copy link
Copy Markdown
Collaborator

No description provided.

The axon handler is the only path into the pending_confirms queue and
already verifies tx sender against the reserved source address before
enqueueing. The second check in the forward-loop re-verify was dead
code — a confirmed on-chain tx can't swap senders out from under us.

Also drop the 2-tuple back-compat in SwapFulfiller.load_sent_cache.
The old shape never shipped to anyone who could still have a cache
file, and carrying the fallback clutters the restore path.
Split ChainProvider.verify_transaction into a private abstract
_fetch_matching_tx (chain-specific RPC/scan) and a concrete
verify_transaction on the base that layers two optional post-checks
every caller used to open-code: require_confirmed and expected_sender.

Callers collapse a 3-rung ladder into one None check:

  miner/fulfillment.verify_user_sent_funds — passes both params; the
  source tx must be confirmed and sent by the reserved user address.

  validator/axon_handlers.handle_swap_confirm — passes expected_sender
  only (unconfirmed txs still need to reach the queue branch). Keeps
  the inline confirmed check for the queue-vs-initiate decision.

SwapVerifier.verify_tx keeps its inline sender check: it uses strict
semantics (empty sender fails) while the base uses lenient (empty
sender allowed, matching the other two callers' pre-refactor behavior).
Promoting one to the other would be a security-posture change — out
of scope for a logic-preserving refactor.

Rejection logs move to the base so the defense is observable in one
place instead of duplicated at every call site. No functional change.
Exception and None branches both did the same retry dance — bump the
counter, resolve on limit, persist otherwise. Normalize exceptions to
None at the top of the loop and extract the counter into a small
_bump_null_retry helper so the two branches become one.
The base verify_transaction check was lenient: an empty/unparseable
sender from the provider (failed vin resolution, malformed extrinsic)
passed through without rejection. A crafted or broken tx could then
evade the user-snipes-miner defense because we couldn't prove who sent
it. Tighten to strict — if we can't identify the sender, we don't
approve the fund flow.

Miner fulfillment and axon confirm paths adopt the stricter check
automatically through the shared base. SwapVerifier.verify_tx now
delegates None/recipient/amount/sender to the base and keeps only the
rate-limited confirmations debug log (validator-polling specific),
extracted into a private helper. The defense lives in exactly one
place shared by miner, axon, and validator verification flows.
Commitment polling fires every 15 blocks (20x/hour); the prune DELETE
was running on every tick for no reason. Pruning is bounded-growth
hygiene, not correctness — the retention cutoff is 2x the scoring
window, so a once-per-scoring-round cadence is more than enough.

Moves prune_aged_rate_events into run_scoring_pass so it runs ~1x/hour
instead of ~20x/hour. Zero semantic change. Test updated to cover the
new contract: commitment polling does not prune, scoring pass does.
Three tied changes that together move rate sampling from coarse-grained
periodic to per-block precision:

1. read_miner_commitments now uses substrate query_map over the
   Commitments.CommitmentOf double map — one RPC round-trip per call
   regardless of miner count, instead of N separate queries in a
   for-loop. Turns ~5000 RPC/hour into ~300 RPC/hour at ~256 miners.

2. poll_commitments loses its 15-block interval gate. Cost is low enough
   now that sampling every forward step (~12s) is the natural cadence,
   giving the crown-time series ~1 block accuracy.

3. RATE_UPDATE_MIN_INTERVAL_BLOCKS throttle is gone. The validator no
   longer drops rate events that land within 75 blocks of the previous
   one — shorter inter-rate gaps become real crown intervals instead of
   getting collapsed. Miners can now post rate updates as often as the
   chain-level commitment cooldown allows and every change is attributed.
   Only dedupe kept: same-rate observations on the same direction.

Obsolete COMMITMENT_POLL_INTERVAL_BLOCKS, RATE_UPDATE_MIN_INTERVAL_BLOCKS,
and the orphaned last_commitment_poll_block field are all removed.
Tests that exercised the throttle are replaced with single-block-gap
and same-rate-dedupe coverage.
Crown-time reward replay now looks back 4 hours instead of 12. Event
retention auto-derives to 8h (2 * window). Shorter window makes the
incentive more responsive to recent behavior — a miner climbing with
a better rate sees it reflected in weights faster, and timed-out
swaps in the last few hours carry proportionally more weight in the
success-rate penalty.

SCORING_INTERVAL_STEPS (how often scoring runs) stays at 300 steps
(~1h) — independent from window size. Each scoring pass covers the
last 4h, EMA in update_scores smooths across passes.
swap_outcomes used to be all-time. A miner with a bad streak a year
ago would carry that penalty forever, and the table grew unbounded.

Add a rolling 30-day credibility window:

- get_all_time_success_rates → get_success_rates_since(block) with a
  WHERE resolved_block >= ? filter. Calculate_miner_rewards passes
  self.block - CREDIBILITY_WINDOW_BLOCKS.
- prune_swap_outcomes_older_than runs from the scoring pass alongside
  the rate-events prune, bounding table growth.
- New idx_swap_outcomes_resolved_block index makes both the read and
  the prune O(log n) instead of scanning the table.
- Docstring on the store module updated: swap_outcomes is no longer
  all-time, it's a rolling credibility ledger.

Miners with no outcomes in the last 30 days still default to 1.0
(optimistic). Miners with bad history can rehabilitate by completing
swaps over the next month.
Functions taking validator, event_watcher, commitment, swap, and other
class instances had no type annotations, so IDEs showed them as Any
and editor navigation broke. Annotate the highest-traffic ones:

- axon_handlers.py: validator (Validator), commitment (MinerPair),
  synapse (bt.Synapse) on all handler + helper functions.
- forward.py: event_watcher (ContractEventWatcher) on
  replay_crown_time_window, item (PendingConfirm) on
  try_extend_reservation.
- dendrite_lite.py: contract_client (AllwaysContractClient), synapse
  (bt.Synapse), axons (List[bt.AxonInfo]).
- utils/rate.py: swap (Swap) on expected_swap_amounts.

Validator and event_watcher use string forward refs + TYPE_CHECKING
to avoid circular imports. Zero runtime change.
The crown-time replay used to credit any active miner that held the best
rate, even while that miner was mid-swap and couldn't actually serve
new users. A solo best-rate miner could never leak credit because the
only eligibility gate was the active flag.

Teach ContractEventWatcher to track open-swap counts per miner:

- SwapInitiated increments the count (+1 BusyEvent)
- SwapCompleted / SwapTimedOut decrements the count (-1 BusyEvent)
- count > 0 means the miner is busy and should be excluded from crown

get_busy_events_in_range and get_busy_miners_at feed the crown-time
replay (next commit). initialize now also queries contract_client.
get_active_swaps() to seed busy state at cold start so a miner serving
a swap at watcher startup isn't treated as idle until their next
terminal event.

Pruning is conservative — open intervals are kept even when their
start events age past the retention cutoff so a long-running swap
doesn't get silently forgotten mid-flight.

Only touches the watcher and its unit tests; scoring replay wiring
follows in a separate commit.
Thread busy-interval exclusion through replay_crown_time_window and
crown_holders_at_instant. A miner with an open swap at a given block
no longer earns crown time for that interval — the tied best rate
among idle eligible miners wins instead, flowing credit to the next-
best runner-up. When no eligible miner is idle the interval recycles.

replay_crown_time_window now merges a third event stream (busy starts
and ends from the watcher) alongside rates and collateral, seeds the
running busy count from watcher.get_busy_miners_at(window_start) for
pre-window swaps that straddle the window edge, and derives the busy
set from the running count on each interval. Ordering at coincident
blocks is busy (0) < collateral (1) < rate (2) so that a reservation
at the same block as a rate change takes effect first.

crown_holders_at_instant gains an optional busy: Set[str] filter.

Test coverage:
- crown_holders_at_instant: busy-excluded and all-busy cases
- replay: runner-up earns during busy interval, solo miner goes busy
  and pool recycles, pre-window initiate reconstructed correctly
read_miner_commitments had no direct unit test — the refactor from
N-RPC loop to query_map went in untested except via E2E suites.
Add four cases mocking substrate.query_map:

- Parses every registered miner's commitment into a MinerPair
- Drops dereg'd hotkeys whose commitment is still in storage
- Regression guard: exactly one query_map call, zero single-query calls
- ConnectionError/TimeoutError during query_map returns an empty list
  instead of raising into the forward loop
The merged replay stream was a List[Tuple[int, int, str, str, float]]
with stringly-typed 'kind' dispatch — unreadable. Replace with:

- EventKind IntEnum whose values ARE the coincident-block sort order
  (BUSY=0 < COLLATERAL=1 < RATE=2) so the ordering invariant is baked
  into the type instead of hidden in a magic-number tuple field.
- ReplayEvent dataclass with block/hotkey/kind/value + a sort_key
  property. Polymorphic value documented per kind.
- _reconstruct_window_start_state and _merge_replay_events extracted
  as small helpers with narrow signatures.
- replay_crown_time_window's body becomes three numbered steps, each
  a single call. The inner loop reads: credit previous span, apply
  this event, advance cursor.

Behavior is identical — same ordering invariants, same crown
attribution math. Every existing test in test_scoring_v1.py passes
unchanged (264 total).
The previous form computed the right answer via a dict comprehension
plus max() but didn't narrate the fallback behavior: when the best-rate
miner is disqualified, credit walks down to the next-best rate that
has a qualified miner. The rule should read like the rule.

- Bucket hotkeys by rate
- Walk best rate → worst
- First bucket with any qualifying miner wins the crown
- Ties at that rate split credit (caller splits interval duration)

Pure readability refactor — same candidates, same tie behavior.
264 existing tests pass unchanged, including the busy-best-rate-loses
case that already covered the fallthrough semantics.
Rename every underscore-prefixed helper introduced on this branch to
plain names. The convention signalled "private" to nobody who wasn't
already reading the file, and it clutters call sites with pseudo-
access-modifier noise the codebase doesn't use elsewhere.

Renames:
- ChainProvider._fetch_matching_tx    -> fetch_matching_tx
- BitcoinProvider / SubtensorProvider concrete overrides follow.
- SwapVerifier._log_confs_progress    -> log_confs_progress
- swap_tracker._bump_null_retry       -> bump_null_retry
- forward._reconstruct_window_start_state -> reconstruct_window_start_state
- forward._merge_replay_events        -> merge_replay_events

Behavior unchanged, 264 tests green.
forward.py was 738 lines mixing four concerns: the per-step forward loop,
commitment polling, pending-confirm drain, fulfillment/timeout logic, and
the entire crown-time scoring algorithm. The scoring code is self-
contained and significantly different in texture, so move it out:

- New allways/validator/scoring.py holds every crown-time symbol:
  run_scoring_pass, prune_aged_rate_events, prune_stale_swap_outcomes,
  calculate_miner_rewards, success_rate, EventKind, ReplayEvent,
  reconstruct_window_start_state, merge_replay_events,
  replay_crown_time_window, crown_holders_at_instant.

- forward.py keeps the actual forward step and its direct helpers
  (commitment poll, pending reservations, fulfillment confirm/extend/
  timeout). Organized under ─── section comments that match the step
  numbers in the forward() body.

- forward() rewritten with blank-line separated numbered phases, each
  phase prefaced with a one-line inline comment saying what it does
  and why the ordering matters. Reads as a checklist instead of a
  dense call graph.

Tests updated: scoring symbols import from scoring now instead of
forward. 264 tests pass unchanged.
``state[2]`` and ``(to_tx_hash, to_tx_block, False)`` was unreadable —
positional tuple access with a magic-number index and no names for
three distinct fields. Wrap it in a SentSwap dataclass with named
attributes (to_tx_hash, to_tx_block, marked_fulfilled) and update:

- load_sent_cache / save_sent_cache use named construction + named
  attribute serialization instead of index juggling
- process_swap renames the local from ``state`` to ``sent`` so it reads
  like English: ``if sent and sent.marked_fulfilled``, not
  ``if state is not None and state[2]``
- The three-state lifecycle (no record / sent but not marked /
  marked) is documented in the process_swap docstring so a reader can
  match each branch in the body to the named state it handles
- marked_fulfilled flip is now an in-place mutation on the cached
  dataclass instead of replacing the tuple

Behavior unchanged, 264 tests green.
Two paired changes that fix the post-restart scoring gap and drop
redundant history buffering:

1. Event watcher initialize() now rewinds the cursor to
   ``max(0, head - SCORING_WINDOW_BLOCKS)`` instead of starting at head.
   The existing bounded sync_to catches up at 50 blocks per forward step,
   so a full window of 1200 blocks backfills over ~24 forward steps
   (~5 min). The first scoring pass runs at step 300 (~1h), long after
   catch-up — no degraded window after a restart. Module docstring was
   already promising this behavior; code now actually does it.

2. EVENT_RETENTION_BLOCKS drops from 2x scoring window to 1x. The 2x
   buffer existed so get_latest_*_before(window_start) could always
   find state from BEFORE the window opens. Replace the buffer with a
   smarter prune: always preserve the latest row per (hotkey, direction)
   regardless of age, so a miner who posts once and never updates still
   has an anchor row for state reconstruction.

Prune fixes:
- state_store.prune_events_older_than: SQL now keeps MAX(id) per
  (hotkey, from_chain, to_chain) group; only rows below cutoff AND
  not-the-latest get deleted.
- event_watcher.prune_old_collateral_events: cutoff halved, latest
  event per hotkey preserved in both the flat list and the per-hotkey
  index.
- Busy event prune unchanged (still only drops closed intervals).

Tests updated for new cursor math and the preserve-latest prune
semantics. 267 pass (264 old + 3 new prune-edge tests).
Both constants were redundant aliases for SCORING_WINDOW_BLOCKS:

- EVENT_RETENTION_BLOCKS was already `= SCORING_WINDOW_BLOCKS`. The prune
  helper now references the window directly.
- SCORING_INTERVAL_STEPS was 300 (~1h), pruning the same 4h window 4×
  per cycle. Aligned to the window so scoring runs once per window, and
  forward.py reads the constant directly instead of a step-count alias.
- purge_expired_pending → purge_expired_pending_confirms (the table is
  named pending_confirms; pending what was ambiguous).
- run_scoring_pass → score_and_reward_miners.
- prune_aged_rate_events → prune_rate_events.
- prune_stale_swap_outcomes → prune_swap_outcomes.

Reorder the scoring pass so reward calculation runs first and pruning
happens after — pruning is hygiene, never block the math.

Drop the `if len(miner_uids) > 0 and len(rewards) > 0` guard in
score_and_reward_miners. update_scores already short-circuits on empty
rewards/uids in neurons/base/validator.py.
… tracker

When the source tx is already confirmed we're about to vote_initiate this
same tick — don't waste an extrinsic on an extend that's about to be
superseded. Move try_extend_reservation into the not-confirmed branch.

Inline EXTEND_THRESHOLD_BLOCKS inside SwapTracker.get_near_timeout_fulfilled
instead of taking it as a parameter — every caller passed the same constant.
The tracker's voted_ids only covers terminal confirm/timeout votes, so
each forward step was re-sending vote_extend_timeout / vote_extend_reservation
until the contract returned AlreadyVoted — wasted gas on every tick during
an extension window.

Two new caches:

- SwapTracker.extend_timeout_voted_at: swap_id → timeout_block we voted under.
  is_extend_timeout_voted self-clears once the contract bumps the swap past
  the voted value, so the next round opens cleanly.
- Validator.extend_reservation_voted_at: (miner_hotkey, from_tx_hash) →
  reserved_until at vote time. Same self-clearing rule. Stale entries are
  dropped at the top of initialize_pending_user_reservations when their
  pending_confirm is gone.

AlreadyVoted responses now feed back into the cache instead of just being
swallowed, so a desync recovers on the next observation rather than burning
extrinsics.
Code reads like a book — the function and section headers carry the
narrative, so block comments restating what the next line does just add
noise. Scrub the numbered phase comments off forward(), the long
docstrings on helpers whose names already describe them, and the heavy
narrative on classes. Keep comments only where the WHY is non-obvious
(WAL/busy_timeout ordering, EventKind tie-breaking, prune-anchor
preservation, why a delta floors at zero).
@anderdc anderdc merged commit c1c512f into test Apr 16, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants