Context
Follow-up to #363 (post-mortem of #360). The branch chore/perf-instrumentation (c5962f0 + four follow-up commits, unpushed) added measurement coverage and an SPHERE_SKIP_RULE4=1 prototype that demonstrated a −33.6% soak wall-clock improvement (−209 s) and −46% on §D.4 recovery (229 s → 123 s) by bypassing the O(N²) UXF-level pairwise verification loop in ProfileTokenStorageProvider.load().
Full report (497 lines, all raw artifacts preserved locally): /tmp/soak-metrics/issue-363/REPORT.md.
This issue captures the remaining open items the #363 data surfaced — none of them blocked the headline finding, but each is a measurable architectural smell worth its own fix.
Headline numbers (from the #363 SKIP_RULE4 A/B)
| section |
baseline |
SPHERE_SKIP_RULE4=1 |
delta |
| §D.4 recovery |
229 s |
123 s |
−46% |
| Total soak wall |
622 s |
413 s |
−33.6% |
profile.load.rule4PairwiseMs |
222.5 s |
0 (bypassed) |
−222.5 s |
profile.applySnapshot fires |
80 |
18 |
−77% |
pointerLayer.recoverLatest fires |
372 |
461 |
+24% (workflow completes faster ⇒ more cycles per wall second) |
Hypotheses already rejected by #363 data
Hypothesis VALIDATED by #363 data
- ✅ "Convert UXF → TXF in memory, verify at TXF level not UXF level." The Rule-4 pairwise loop at
profile/profile-token-storage-provider.ts:1521–1533 is 35.8% of the 622 s soak. The graceful-degradation catch path already proves the codebase tolerates verifiedProofs = undefined; making that the default for snapshot-sourced loads is a 5-line change.
Snapshot-path baseline (validates the design expectation)
When the apply-snapshot path is allowed to operate alone (the user's stated design: scan-pointer → fetch-blob → done), each step is fast:
| sub-step |
calls |
avg ms |
max ms |
applySnapshotCb.fetchRootMs (single-shot block fetch by CID) |
18 |
6.0 |
24 |
applySnapshotCb.parseSnapshotMs (CBOR decode of root block) |
18 |
5.5 |
20 |
applySnapshotCb.dispatchMs (OrbitDB rebuild from parsed snapshot) |
18 |
385 |
954 |
applySnapshotCb.totalMs |
18 |
397 |
981 |
So when the path is exercised correctly, full apply-snapshot is ~400 ms, not "dozens of seconds" — confirming the maintainer's design statement.
Open items for THIS issue
1. pointerLayer.recoverLatest over-invocation (P1)
In the SKIP_RULE4 run, §D.4 ran for 123 s and observed 108 recoverLatest calls + 110 findLatestValidVersion calls for a single mnemonic recovery. The design (per the user, and per CLAUDE.md's L4 model) expects exactly one recoverLatest per recovery + exactly one applySnapshot.
Per-second rate: 108/123 = 0.88 recoveries/sec. Each recoverLatest averages 250 ms (4.5 s max), totalling 27.1 s = 22% of §D.4 wall in the SKIP run.
Acceptance criteria:
- Add a counter for "recoveries served from cache vs cold".
- Instrument the calling code path (likely lifecycle-manager.ts or sync-loop) to determine why pointer-scan re-runs every cycle.
- Either cache the recovered pointer head for the duration of the recovery window, or short-circuit subsequent calls inside
recoverLatest when the head was just resolved.
- Verify with a fresh A/B soak that §D.4 drops below 80 s (target: design-correct ≈ 10 s, realistic-floor ≈ 50 s).
2. profile.applySnapshot fires during sphere clear (P2)
In the baseline soak, §D.3 ("sphere clear on all wallets", 41 s wall) recorded 23 profile.applySnapshot invocations out of 80 total for the soak. A clear command applying snapshots is structurally wrong — clear is supposed to delete state, not seed it.
Acceptance criteria:
- Trace the call graph that fires applySnapshot during clear.
- Either guard
setApplySnapshotCallback's closure against isShuttingDown || hasShutdown (it's already partially guarded — check applySnapshotIfWired and the closure body) or guard clear() against firing snapshot-recovery cycles.
- Soak with the same env as before; §D.3 should drop below 15 s and
applySnapshot.fired calls during §D.3 should be 0.
3. Attribute the remaining 47 s of §D.4 (P2)
After SKIP_RULE4, §D.4 is 123 s wall. Instrumented total accounts for 76 s (62%); the remaining 47 s (38%) is in code that has no counter yet. Top suspects:
- Token deserialization in
serialization/txf-serializer.ts (called per token from parseTokensFromBundle).
- Post-recovery flush (
flushScheduler.flushToIpfs already instrumented — confirm it's not double-counted).
- OrbitDB internal append/index pipeline during dispatch (the IPLog write inside
db.put, which the orbitdb.putEntry counter wraps but may not cover all paths).
- Helia FsBlockstore writes during snapshot dispatch (no counter yet).
Acceptance criteria:
- Add per-step counters inside
profile-snapshot-dispatcher.runProfileSnapshotJoin (decode, per-writer dispatch, per-key apply).
- Add a counter around
parseTokensFromBundle / extractTokensFromTxfData.
- Re-soak; instrumented coverage of §D.4 should reach ≥85%.
4. Convert SPHERE_SKIP_RULE4 prototype into a design-correct gate (P1)
The current env flag bypasses Rule-4 unconditionally. The correct design (per #363 conclusion) is: skip Rule-4 pairwise enrichment when the load source is a single trusted snapshot blob (i.e., snapshot-driven recovery), but keep it for genuine cross-device JOIN (multiple device contributions where Rule 4 is the conflict-resolution mechanism).
Acceptance criteria:
- In
ProfileTokenStorageProvider.load(), detect whether loadedBundles originate from a single snapshot blob vs from multiple independent device bundles. Likely signal: presence of a snapshot pointer CID + bundles all referenced from that snapshot.
- When snapshot-sourced, set
verifiedProofs = undefined (matches the existing fallback path on verifier failure).
- When cross-device-sourced (e.g., live sync merge), keep the pairwise loop.
- Document the design choice in code + REPORT.md addendum.
- Remove
SPHERE_SKIP_RULE4 env flag once the gate is in.
5. Kubo connection-cap A/B (P3 — operational, no code change)
Current host kubo container: Swarm.ConnMgr.HighWater=128 / LowWater=96. Maintainer's hypothesis: too many unrelated peers degrade kubo, slowing the IPFS-pin step inside flushScheduler.flushToIpfs and ipfs.pinCar.
Acceptance criteria:
- Reduce to
HighWater=32 / LowWater=24 via:
docker exec ipfs-kubo ipfs config --json Swarm.ConnMgr.HighWater 32
docker exec ipfs-kubo ipfs config --json Swarm.ConnMgr.LowWater 24
docker restart ipfs-kubo
- Re-run the soak with the same SPHERE_PERF instrumentation.
- Compare
flushScheduler.flushToIpfs.totalMs, ipfs.pinCar.totalMs, orbitdb.putEntry.{bundle,other} against the baseline.
- Decide if the cap should be applied permanently in the host's kubo config or as a documented recommendation.
6. Fix the §D.5 snapshot-diff contamination from SPHERE_PERF=1 (P3 — test harness)
Both #363 SKIP and baseline runs reach §D.5 ("Assertions") and fail because normalize_snapshot() in manual-test-full-recovery.sh:180 strips single-line timestamped logs but the perf-counter snapshot is a multi-line block; continuation lines ( 'counter': { count: ... }) survive the filter and create spurious diffs.
Three fix options (any one is sufficient):
- (a) Extend
normalize_snapshot() to strip the entire [perf-counters] snapshot: { ... } block.
- (b) Route
logger.info('perf', …) to stderr (or to a side-file) instead of stdout.
- (c) Add a
--no-perf-in-snapshots mode to the test that disables auto-dump for sphere status invocations specifically.
Acceptance criteria: A soak run with SPHERE_PERF=1 produces passed=1/1 for manual-test-full-recovery.sh end-to-end.
Files touched in chore/perf-instrumentation (unpushed)
core/perf-counters.ts (new)
profile/orbitdb-adapter.ts (counters)
profile/profile-token-storage/bundle-index.ts (counters)
profile/profile-token-storage/flush-scheduler.ts (counters)
profile/ipfs-client.ts (counters)
profile/factory.ts (applySnapshotCb counters)
profile/profile-token-storage-provider.ts (load + applySnapshot counters + SKIP_RULE4 env gate)
profile/profile-snapshot-cache.ts (read/write counters)
profile/aggregator-pointer/ProfilePointerLayer.ts (recoverLatest, discover, probe, classify)
profile/aggregator-pointer/discover-algorithm.ts (findLatestValidVersion)
oracle/UnicityAggregatorProvider.ts (aggregator-call counters)
uxf/UxfPackage.ts (computeVerifiedProofs counters)
tests/unit/core/perf-counters.test.ts (15 tests, pre-existing)
.tmp/soak-264.sh (grep -c arithmetic fix)
All counters are gated by SPHERE_PERF=1 (zero overhead when off). Build: npm run build produces dist/profile/{index,node,browser}.js with all strings inlined.
Reproducing the A/B
# Baseline (Rule-4 enabled)
ITERATIONS=1 ARTIFACTS_ROOT=/tmp/soak-baseline/soak-runs \\
SPHERE_PERF=1 SPHERE_DEBUG=perf=info \\
bash /home/vrogojin/uxf/.tmp/soak-264.sh
# Treatment (Rule-4 bypassed)
ITERATIONS=1 ARTIFACTS_ROOT=/tmp/soak-skip/soak-runs \\
SPHERE_PERF=1 SPHERE_DEBUG=perf=info SPHERE_SKIP_RULE4=1 \\
bash /home/vrogojin/uxf/.tmp/soak-264.sh
# Aggregate
node /tmp/soak-metrics/issue-363/aggregate-perf.mjs /tmp/soak-baseline/soak-runs
node /tmp/soak-metrics/issue-363/aggregate-perf.mjs /tmp/soak-skip/soak-runs
Refs
Context
Follow-up to #363 (post-mortem of #360). The branch
chore/perf-instrumentation(c5962f0+ four follow-up commits, unpushed) added measurement coverage and anSPHERE_SKIP_RULE4=1prototype that demonstrated a −33.6% soak wall-clock improvement (−209 s) and −46% on §D.4 recovery (229 s → 123 s) by bypassing the O(N²) UXF-level pairwise verification loop inProfileTokenStorageProvider.load().Full report (497 lines, all raw artifacts preserved locally):
/tmp/soak-metrics/issue-363/REPORT.md.This issue captures the remaining open items the #363 data surfaced — none of them blocked the headline finding, but each is a measurable architectural smell worth its own fix.
Headline numbers (from the #363 SKIP_RULE4 A/B)
profile.load.rule4PairwiseMsprofile.applySnapshotfirespointerLayer.recoverLatestfiresHypotheses already rejected by #363 data
OrbitDbAdapter.putEntrybatching wins (per perf(profile): sync amplification + O(B²) JOIN + 5× OpLog walks — extreme CPU + apparent memory leak on edge devices #360 maintainer hypothesis) — bundle-ref puts fire 150 times in 1185 s (0.13/sec), total 12.7 s = 1.1%.BundleIndex.listBundles5×/flush ratio (per perf(profile): sync amplification + O(B²) JOIN + 5× OpLog walks — extreme CPU + apparent memory leak on edge devices #360 Finding Direct Messages #3) — actual ratio 18×, but absolute cost 9.5 s = 0.8%.verifyInclusionProofrewrite (per perf(profile): sync amplification + O(B²) JOIN + 5× OpLog walks — extreme CPU + apparent memory leak on edge devices #360 Finding DMs not working #2) — inner verify is local crypto at 0.006 ms avg, 3.6% of soak. The dominant cost was the pairwise outer wrapper, not the crypto.Hypothesis VALIDATED by #363 data
profile/profile-token-storage-provider.ts:1521–1533is 35.8% of the 622 s soak. The graceful-degradation catch path already proves the codebase toleratesverifiedProofs = undefined; making that the default for snapshot-sourced loads is a 5-line change.Snapshot-path baseline (validates the design expectation)
When the apply-snapshot path is allowed to operate alone (the user's stated design: scan-pointer → fetch-blob → done), each step is fast:
applySnapshotCb.fetchRootMs(single-shot block fetch by CID)applySnapshotCb.parseSnapshotMs(CBOR decode of root block)applySnapshotCb.dispatchMs(OrbitDB rebuild from parsed snapshot)applySnapshotCb.totalMsSo when the path is exercised correctly, full apply-snapshot is ~400 ms, not "dozens of seconds" — confirming the maintainer's design statement.
Open items for THIS issue
1.
pointerLayer.recoverLatestover-invocation (P1)In the SKIP_RULE4 run, §D.4 ran for 123 s and observed 108
recoverLatestcalls + 110findLatestValidVersioncalls for a single mnemonic recovery. The design (per the user, and per CLAUDE.md's L4 model) expects exactly one recoverLatest per recovery + exactly one applySnapshot.Per-second rate: 108/123 = 0.88 recoveries/sec. Each recoverLatest averages 250 ms (4.5 s max), totalling 27.1 s = 22% of §D.4 wall in the SKIP run.
Acceptance criteria:
recoverLatestwhen the head was just resolved.2.
profile.applySnapshotfires duringsphere clear(P2)In the baseline soak, §D.3 ("sphere clear on all wallets", 41 s wall) recorded 23
profile.applySnapshotinvocations out of 80 total for the soak. A clear command applying snapshots is structurally wrong — clear is supposed to delete state, not seed it.Acceptance criteria:
setApplySnapshotCallback's closure againstisShuttingDown || hasShutdown(it's already partially guarded — checkapplySnapshotIfWiredand the closure body) or guardclear()against firing snapshot-recovery cycles.applySnapshot.firedcalls during §D.3 should be 0.3. Attribute the remaining 47 s of §D.4 (P2)
After SKIP_RULE4, §D.4 is 123 s wall. Instrumented total accounts for 76 s (62%); the remaining 47 s (38%) is in code that has no counter yet. Top suspects:
serialization/txf-serializer.ts(called per token fromparseTokensFromBundle).flushScheduler.flushToIpfsalready instrumented — confirm it's not double-counted).db.put, which theorbitdb.putEntrycounter wraps but may not cover all paths).Acceptance criteria:
profile-snapshot-dispatcher.runProfileSnapshotJoin(decode, per-writer dispatch, per-key apply).parseTokensFromBundle/extractTokensFromTxfData.4. Convert
SPHERE_SKIP_RULE4prototype into a design-correct gate (P1)The current env flag bypasses Rule-4 unconditionally. The correct design (per #363 conclusion) is: skip Rule-4 pairwise enrichment when the load source is a single trusted snapshot blob (i.e., snapshot-driven recovery), but keep it for genuine cross-device JOIN (multiple device contributions where Rule 4 is the conflict-resolution mechanism).
Acceptance criteria:
ProfileTokenStorageProvider.load(), detect whetherloadedBundlesoriginate from a single snapshot blob vs from multiple independent device bundles. Likely signal: presence of a snapshot pointer CID + bundles all referenced from that snapshot.verifiedProofs = undefined(matches the existing fallback path on verifier failure).SPHERE_SKIP_RULE4env flag once the gate is in.5. Kubo connection-cap A/B (P3 — operational, no code change)
Current host kubo container:
Swarm.ConnMgr.HighWater=128 / LowWater=96. Maintainer's hypothesis: too many unrelated peers degrade kubo, slowing the IPFS-pin step insideflushScheduler.flushToIpfsandipfs.pinCar.Acceptance criteria:
HighWater=32 / LowWater=24via:flushScheduler.flushToIpfs.totalMs,ipfs.pinCar.totalMs,orbitdb.putEntry.{bundle,other}against the baseline.6. Fix the §D.5 snapshot-diff contamination from
SPHERE_PERF=1(P3 — test harness)Both #363 SKIP and baseline runs reach §D.5 ("Assertions") and fail because
normalize_snapshot()inmanual-test-full-recovery.sh:180strips single-line timestamped logs but the perf-counter snapshot is a multi-line block; continuation lines ('counter': { count: ... }) survive the filter and create spurious diffs.Three fix options (any one is sufficient):
normalize_snapshot()to strip the entire[perf-counters] snapshot: { ... }block.logger.info('perf', …)to stderr (or to a side-file) instead of stdout.--no-perf-in-snapshotsmode to the test that disables auto-dump forsphere statusinvocations specifically.Acceptance criteria: A soak run with
SPHERE_PERF=1producespassed=1/1formanual-test-full-recovery.shend-to-end.Files touched in
chore/perf-instrumentation(unpushed)All counters are gated by
SPHERE_PERF=1(zero overhead when off). Build:npm run buildproducesdist/profile/{index,node,browser}.jswith all strings inlined.Reproducing the A/B
Refs
Swarm.ConnMgroperational tuning (no SDK change)