Skip to content

perf(per-miner): cache reverse cid→seq index so recover doesn't re-scan the allotment per submit#298

Closed
nghetienhiep wants to merge 1 commit into
cathedralai:mainfrom
nghetienhiep:perf/recover-tier-seq-index
Closed

perf(per-miner): cache reverse cid→seq index so recover doesn't re-scan the allotment per submit#298
nghetienhiep wants to merge 1 commit into
cathedralai:mainfrom
nghetienhiep:perf/recover-tier-seq-index

Conversation

@nghetienhiep

@nghetienhiep nghetienhiep commented Jun 26, 2026

Copy link
Copy Markdown

Summary

#296 made PM submit tolerate assignment-row replica lag by falling back to recover_tier_seq_for(...) when _lookup_perminer_assignment misses. That recovery re-scans the miner's full allotment, recomputing instance_id (HMAC-SHA256) for every seq — up to ~allotment_for(tier) (10k by default) HMACs per call.

Because that fallback runs on the submit hot path inside the _submit_slot gate, and fires precisely when the assignment row is replica-lagged (which can be a large share of submits under load), the O(allotment) re-scan can dominate submit latency and spike CPU — holding gate slots longer and lowering submit throughput exactly when the gate is saturated.

Change

instance_id is a deterministic HMAC, so the cid → seq map for a given (hotkey, epoch, tier) is stable. Build it once and cache it (lru_cache, bounded by CATHEDRAL_PERMINER_RECOVER_INDEX_CACHE, default 64 maps) → recovery becomes amortized O(1).

Behaviour is unchanged: still identity-bound (a foreign or bogus challenge_id resolves to None); recover_seq_for and the #296 replica-lag path are unaffected.

recover_tier_seq_for re-scanned the full allotment (~10k HMAC-SHA256 per call)
on every lookup. Since cathedralai#296 calls it on the submit path whenever the assignment
row is replica-lagged — and that runs inside the submit gate slot — a high
replica-miss rate makes the O(allotment) re-scan dominate submit latency and
spike CPU under load.

instance_id is a deterministic HMAC, so the cid->seq map is stable; build it
once per (hotkey, epoch, tier) and cache it (lru_cache, bounded by
CATHEDRAL_PERMINER_RECOVER_INDEX_CACHE, default 64) for amortized O(1) lookups.
Behaviour is unchanged (still identity-bound: a foreign or bogus challenge_id
still resolves to None).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@nghetienhiep nghetienhiep deleted the perf/recover-tier-seq-index branch June 26, 2026 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant