perf(profile): remaining recovery-path bottlenecks after #363 — pointer-layer over-invocation, applySnapshot during clear, kubo conn cap, Rule-4 design gate

## Context

Follow-up to **#363** (post-mortem of #360). The branch `chore/perf-instrumentation` (`c5962f0` + four follow-up commits, unpushed) added measurement coverage and an `SPHERE_SKIP_RULE4=1` prototype that demonstrated a **−33.6% soak wall-clock improvement (−209 s) and −46% on §D.4 recovery (229 s → 123 s)** by bypassing the O(N²) UXF-level pairwise verification loop in `ProfileTokenStorageProvider.load()`.

Full report (497 lines, all raw artifacts preserved locally): `/tmp/soak-metrics/issue-363/REPORT.md`.

This issue captures the **remaining open items** the #363 data surfaced — none of them blocked the headline finding, but each is a measurable architectural smell worth its own fix.

## Headline numbers (from the #363 SKIP_RULE4 A/B)

| section | baseline | SPHERE_SKIP_RULE4=1 | delta |
|---|---|---|---|
| §D.4 recovery | 229 s | 123 s | **−46%** |
| Total soak wall | 622 s | 413 s | **−33.6%** |
| `profile.load.rule4PairwiseMs` | 222.5 s | 0 (bypassed) | **−222.5 s** |
| `profile.applySnapshot` fires | 80 | 18 | **−77%** |
| `pointerLayer.recoverLatest` fires | 372 | 461 | **+24%** (workflow completes faster ⇒ more cycles per wall second) |

## Hypotheses already rejected by #363 data

- ❌ Aggregator HTTP is the bottleneck — total aggregator-RPC was **1.97% of soak** (11.5 s of 622 s).
- ❌ `OrbitDbAdapter.putEntry` batching wins (per #360 maintainer hypothesis) — bundle-ref puts fire 150 times in 1185 s (0.13/sec), total 12.7 s = 1.1%.
- ❌ `BundleIndex.listBundles` 5×/flush ratio (per #360 Finding #3) — actual ratio 18×, but absolute cost 9.5 s = 0.8%.
- ❌ Crypto `verifyInclusionProof` rewrite (per #360 Finding #2) — inner verify is local crypto at 0.006 ms avg, 3.6% of soak. The dominant cost was the *pairwise outer wrapper*, not the crypto.

## Hypothesis VALIDATED by #363 data

- ✅ **\"Convert UXF → TXF in memory, verify at TXF level not UXF level.\"** The Rule-4 pairwise loop at `profile/profile-token-storage-provider.ts:1521–1533` is **35.8% of the 622 s soak**. The graceful-degradation catch path already proves the codebase tolerates `verifiedProofs = undefined`; making that the default for snapshot-sourced loads is a 5-line change.

## Snapshot-path baseline (validates the design expectation)

When the apply-snapshot path is allowed to operate alone (the user's stated design: scan-pointer → fetch-blob → done), each step is **fast**:

| sub-step | calls | avg ms | max ms |
|---|---|---|---|
| `applySnapshotCb.fetchRootMs` (single-shot block fetch by CID) | 18 | **6.0** | 24 |
| `applySnapshotCb.parseSnapshotMs` (CBOR decode of root block) | 18 | **5.5** | 20 |
| `applySnapshotCb.dispatchMs` (OrbitDB rebuild from parsed snapshot) | 18 | **385** | 954 |
| **`applySnapshotCb.totalMs`** | 18 | **397** | 981 |

So when the path is exercised correctly, full apply-snapshot is **~400 ms**, not \"dozens of seconds\" — confirming the maintainer's design statement.

---

## Open items for THIS issue

### 1. `pointerLayer.recoverLatest` over-invocation (P1)

In the SKIP_RULE4 run, §D.4 ran for 123 s and observed **108 `recoverLatest` calls + 110 `findLatestValidVersion` calls** for a single mnemonic recovery. The design (per the user, and per CLAUDE.md's L4 model) expects **exactly one** recoverLatest per recovery + exactly one applySnapshot.

Per-second rate: 108/123 = **0.88 recoveries/sec**. Each recoverLatest averages 250 ms (4.5 s max), totalling 27.1 s = **22% of §D.4 wall** in the SKIP run.

**Acceptance criteria:**
- Add a counter for \"recoveries served from cache vs cold\".
- Instrument the calling code path (likely lifecycle-manager.ts or sync-loop) to determine why pointer-scan re-runs every cycle.
- Either cache the recovered pointer head for the duration of the recovery window, or short-circuit subsequent calls inside `recoverLatest` when the head was just resolved.
- Verify with a fresh A/B soak that §D.4 drops below 80 s (target: design-correct ≈ 10 s, realistic-floor ≈ 50 s).

### 2. `profile.applySnapshot` fires during `sphere clear` (P2)

In the baseline soak, §D.3 (\"sphere clear on all wallets\", 41 s wall) recorded **23 `profile.applySnapshot` invocations** out of 80 total for the soak. A *clear* command applying snapshots is structurally wrong — clear is supposed to delete state, not seed it.

**Acceptance criteria:**
- Trace the call graph that fires applySnapshot during clear.
- Either guard `setApplySnapshotCallback`'s closure against `isShuttingDown || hasShutdown` (it's already partially guarded — check `applySnapshotIfWired` and the closure body) or guard `clear()` against firing snapshot-recovery cycles.
- Soak with the same env as before; §D.3 should drop below 15 s and `applySnapshot.fired` calls during §D.3 should be 0.

### 3. Attribute the remaining 47 s of §D.4 (P2)

After SKIP_RULE4, §D.4 is 123 s wall. Instrumented total accounts for **76 s (62%)**; the remaining **47 s (38%)** is in code that has no counter yet. Top suspects:

- Token deserialization in `serialization/txf-serializer.ts` (called per token from `parseTokensFromBundle`).
- Post-recovery flush (`flushScheduler.flushToIpfs` already instrumented — confirm it's not double-counted).
- OrbitDB internal append/index pipeline during dispatch (the IPLog write inside `db.put`, which the `orbitdb.putEntry` counter wraps but may not cover all paths).
- Helia FsBlockstore writes during snapshot dispatch (no counter yet).

**Acceptance criteria:**
- Add per-step counters inside `profile-snapshot-dispatcher.runProfileSnapshotJoin` (decode, per-writer dispatch, per-key apply).
- Add a counter around `parseTokensFromBundle` / `extractTokensFromTxfData`.
- Re-soak; instrumented coverage of §D.4 should reach ≥85%.

### 4. Convert `SPHERE_SKIP_RULE4` prototype into a design-correct gate (P1)

The current env flag bypasses Rule-4 unconditionally. The correct design (per #363 conclusion) is: **skip Rule-4 pairwise enrichment when the load source is a single trusted snapshot blob** (i.e., snapshot-driven recovery), but keep it for genuine cross-device JOIN (multiple device contributions where Rule 4 is the conflict-resolution mechanism).

**Acceptance criteria:**
- In `ProfileTokenStorageProvider.load()`, detect whether `loadedBundles` originate from a single snapshot blob vs from multiple independent device bundles. Likely signal: presence of a snapshot pointer CID + bundles all referenced from that snapshot.
- When snapshot-sourced, set `verifiedProofs = undefined` (matches the existing fallback path on verifier failure).
- When cross-device-sourced (e.g., live sync merge), keep the pairwise loop.
- Document the design choice in code + REPORT.md addendum.
- Remove `SPHERE_SKIP_RULE4` env flag once the gate is in.

### 5. Kubo connection-cap A/B (P3 — operational, no code change)

Current host kubo container: `Swarm.ConnMgr.HighWater=128 / LowWater=96`. Maintainer's hypothesis: too many unrelated peers degrade kubo, slowing the IPFS-pin step inside `flushScheduler.flushToIpfs` and `ipfs.pinCar`.

**Acceptance criteria:**
- Reduce to `HighWater=32 / LowWater=24` via:
  ```bash
  docker exec ipfs-kubo ipfs config --json Swarm.ConnMgr.HighWater 32
  docker exec ipfs-kubo ipfs config --json Swarm.ConnMgr.LowWater 24
  docker restart ipfs-kubo
  ```
- Re-run the soak with the same SPHERE_PERF instrumentation.
- Compare `flushScheduler.flushToIpfs.totalMs`, `ipfs.pinCar.totalMs`, `orbitdb.putEntry.{bundle,other}` against the baseline.
- Decide if the cap should be applied permanently in the host's kubo config or as a documented recommendation.

### 6. Fix the §D.5 snapshot-diff contamination from `SPHERE_PERF=1` (P3 — test harness)

Both #363 SKIP and baseline runs reach §D.5 (\"Assertions\") and fail because `normalize_snapshot()` in `manual-test-full-recovery.sh:180` strips single-line timestamped logs but the perf-counter snapshot is a multi-line block; continuation lines (`  'counter': { count: ... }`) survive the filter and create spurious diffs.

Three fix options (any one is sufficient):

- (a) Extend `normalize_snapshot()` to strip the entire `[perf-counters] snapshot: { ... }` block.
- (b) Route `logger.info('perf', …)` to stderr (or to a side-file) instead of stdout.
- (c) Add a `--no-perf-in-snapshots` mode to the test that disables auto-dump for `sphere status` invocations specifically.

**Acceptance criteria:** A soak run with `SPHERE_PERF=1` produces `passed=1/1` for `manual-test-full-recovery.sh` end-to-end.

---

## Files touched in `chore/perf-instrumentation` (unpushed)

```
core/perf-counters.ts                                       (new)
profile/orbitdb-adapter.ts                                  (counters)
profile/profile-token-storage/bundle-index.ts               (counters)
profile/profile-token-storage/flush-scheduler.ts            (counters)
profile/ipfs-client.ts                                      (counters)
profile/factory.ts                                          (applySnapshotCb counters)
profile/profile-token-storage-provider.ts                   (load + applySnapshot counters + SKIP_RULE4 env gate)
profile/profile-snapshot-cache.ts                           (read/write counters)
profile/aggregator-pointer/ProfilePointerLayer.ts           (recoverLatest, discover, probe, classify)
profile/aggregator-pointer/discover-algorithm.ts            (findLatestValidVersion)
oracle/UnicityAggregatorProvider.ts                         (aggregator-call counters)
uxf/UxfPackage.ts                                           (computeVerifiedProofs counters)
tests/unit/core/perf-counters.test.ts                       (15 tests, pre-existing)
.tmp/soak-264.sh                                            (grep -c arithmetic fix)
```

All counters are gated by `SPHERE_PERF=1` (zero overhead when off). Build: `npm run build` produces `dist/profile/{index,node,browser}.js` with all strings inlined.

## Reproducing the A/B

```bash
# Baseline (Rule-4 enabled)
ITERATIONS=1 ARTIFACTS_ROOT=/tmp/soak-baseline/soak-runs \\
  SPHERE_PERF=1 SPHERE_DEBUG=perf=info \\
  bash /home/vrogojin/uxf/.tmp/soak-264.sh

# Treatment (Rule-4 bypassed)
ITERATIONS=1 ARTIFACTS_ROOT=/tmp/soak-skip/soak-runs \\
  SPHERE_PERF=1 SPHERE_DEBUG=perf=info SPHERE_SKIP_RULE4=1 \\
  bash /home/vrogojin/uxf/.tmp/soak-264.sh

# Aggregate
node /tmp/soak-metrics/issue-363/aggregate-perf.mjs /tmp/soak-baseline/soak-runs
node /tmp/soak-metrics/issue-363/aggregate-perf.mjs /tmp/soak-skip/soak-runs
```

## Refs

- Closes-after: #363 (perf-instrumentation branch + measurement-only work)
- Replaces-direction-of: #360 (closed, dead-end)
- Related: kubo `Swarm.ConnMgr` operational tuning (no SDK change)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(profile): remaining recovery-path bottlenecks after #363 — pointer-layer over-invocation, applySnapshot during clear, kubo conn cap, Rule-4 design gate #364

Context

Headline numbers (from the #363 SKIP_RULE4 A/B)

Hypotheses already rejected by #363 data

Hypothesis VALIDATED by #363 data

Snapshot-path baseline (validates the design expectation)

Open items for THIS issue

1. `pointerLayer.recoverLatest` over-invocation (P1)

2. `profile.applySnapshot` fires during `sphere clear` (P2)

3. Attribute the remaining 47 s of §D.4 (P2)

4. Convert `SPHERE_SKIP_RULE4` prototype into a design-correct gate (P1)

5. Kubo connection-cap A/B (P3 — operational, no code change)

6. Fix the §D.5 snapshot-diff contamination from `SPHERE_PERF=1` (P3 — test harness)

Files touched in `chore/perf-instrumentation` (unpushed)

Reproducing the A/B

Refs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

section	baseline	SPHERE_SKIP_RULE4=1	delta
§D.4 recovery	229 s	123 s	−46%
Total soak wall	622 s	413 s	−33.6%
`profile.load.rule4PairwiseMs`	222.5 s	0 (bypassed)	−222.5 s
`profile.applySnapshot` fires	80	18	−77%
`pointerLayer.recoverLatest` fires	372	461	+24% (workflow completes faster ⇒ more cycles per wall second)

sub-step	calls	avg ms	max ms
`applySnapshotCb.fetchRootMs` (single-shot block fetch by CID)	18	6.0	24
`applySnapshotCb.parseSnapshotMs` (CBOR decode of root block)	18	5.5	20
`applySnapshotCb.dispatchMs` (OrbitDB rebuild from parsed snapshot)	18	385	954
`applySnapshotCb.totalMs`	18	397	981

perf(profile): remaining recovery-path bottlenecks after #363 — pointer-layer over-invocation, applySnapshot during clear, kubo conn cap, Rule-4 design gate #364

Description

Context

Headline numbers (from the #363 SKIP_RULE4 A/B)

Hypotheses already rejected by #363 data

Hypothesis VALIDATED by #363 data

Snapshot-path baseline (validates the design expectation)

Open items for THIS issue

1. pointerLayer.recoverLatest over-invocation (P1)

2. profile.applySnapshot fires during sphere clear (P2)

3. Attribute the remaining 47 s of §D.4 (P2)

4. Convert SPHERE_SKIP_RULE4 prototype into a design-correct gate (P1)

5. Kubo connection-cap A/B (P3 — operational, no code change)

6. Fix the §D.5 snapshot-diff contamination from SPHERE_PERF=1 (P3 — test harness)

Files touched in chore/perf-instrumentation (unpushed)

Reproducing the A/B

Refs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `pointerLayer.recoverLatest` over-invocation (P1)

2. `profile.applySnapshot` fires during `sphere clear` (P2)

4. Convert `SPHERE_SKIP_RULE4` prototype into a design-correct gate (P1)

6. Fix the §D.5 snapshot-diff contamination from `SPHERE_PERF=1` (P3 — test harness)

Files touched in `chore/perf-instrumentation` (unpushed)