perf(profile): post-mortem of #360 — static-analysis fixes regressed; instrumentation added for a real profile-driven investigation

## Status

This issue replaces #360.

#360 has been **closed as a failed attempt**. The 7-finding stack on `fix/issue-360-profile-sync-perf` (commits b8ba5fc → a8094e8) was implemented in good faith and then A/B-soak-measured against `main`. The measurement was unambiguous: **the "perf fixes" make things worse and introduce a correctness regression**.

This issue exists so the next investigator can pick up cleanly with:

1. The negative result from #360 in one place, so we don't redo the same dead-end work.
2. The hypothesis the maintainer wants explored next (batching writes).
3. The measurement infrastructure that has been added to make the next attempt **profile-driven** rather than static-analysis-driven.

---

## What was measured

A/B soak on 2026-05-30 using `.tmp/soak-264.sh ITERATIONS=1`, with `SPHERE_DEBUG=*` and a 5-second `top`-style sampler running alongside.

**Both runs reach B ≈ 70 active bundles** — the same regime #360 cited as worst-case. So the regression cannot be dismissed as "the soak doesn't exercise the targeted scale".

| Section | Baseline (`main`) | Treatment (#360 fixes) | Δ |
|---|---|---|---|
| §1 Peer1 setup | 55 s | 58 s | +3 s |
| §A.1 Peer2-alice | 31 s | 31 s | 0 |
| §C.1b deliver invoice | 6 s | 16 s | +10 s (2.67×) |
| **§C.2 Bob pays** | **62 s** | **148 s** | **+86 s (2.39×)** |
| **§C.4 cross-device sync** | **43 s** | **75 s** | **+32 s (1.74×)** |
| **§D.1 pre-clear snapshots** | **85 s** | **127 s** | **+42 s (1.49×)** |
| §D.3 sphere clear | 47 s | 50 s | +3 s |
| §D.4 IPFS-only recovery | 219 s | 238 s | +19 s |
| §D.5 Assertions | **PASS** | **FAIL rc=1** | — |
| §E ledger | 22 s | _not reached_ | — |
| **TOTAL** | **631 s** | **805 s** | **+174 s (1.28× slower)** |

**Resource (trimmed to active soak window, 30 s start-skip):**

| Metric | Baseline | Treatment |
|---|---|---|
| avg RSS | 1.21 GB | 1.40 GB (+16 %) |
| avg CPU% | 147 % | 163 % (+11 %) |
| max RSS | 2.04 GB | 4.74 GB* |
| max CPU% | 333 % | 1024 %* |

*Treatment max RSS / CPU peak is a parallel `tsup` DTS build of the `main` baseline worktree that ran during the first ~30 s of the treatment window — not the SDK itself. Discount when comparing peaks.

### Correctness regression — §D.5

Treatment failed §D.5 with a real cross-device divergence on Bob's UCT balance after recovery:

```
ASSERT FAIL (bob-peer1-vs-peer2-after)
-UCT: 99.999999999989 (1 token)    ← peer1, recovered from mnemonic + IPFS
+UCT: 199.999999999989 (2 tokens)  ← peer2, original device
```

Baseline ran the same flow with identical inputs and **all 6 assertions passed**.

---

## What went wrong in #360 (post-mortem)

The original issue body looked like a rigorous perf analysis. It is **not**. It is **static analysis of source code** with cost models derived by multiplying source-code counts. There is no flamegraph, no `node --prof` output, no `perf record`, no allocation sampler, no actual CPU% trace from the symptom window. The "525k Map allocations per load" and "6.5 minutes synchronous network IO" numbers are computations on the structure of the code, not on what the code does at runtime.

Implementing those findings produced the measured regression because:

1. **The pathologies called out were not the actual limiter** on the workload. The dominant sections (§C.2 send, §D.1 pre-clear, §D.4 recovery) are network-bound (Nostr round-trips, aggregator HEAD-verify, IPFS gateway propagation), not CPU-bound. Local CPU optimisation cannot shorten a network-bound wait — but it can lengthen it if the optimisation triggers rate-limiting.

2. **Parallel fan-out (Findings #2, #5, cap=8) likely triggers testnet rate-limiting at B ≈ 70**. The pre-fix sequential dispatch may have been pessimal-looking but rate-limit-safe. Going from 1 in-flight → 8 in-flight against a shared aggregator + Kubo gateway turns "slow but steady" into "throttled and worse".

3. **Finding #1's identity-filter on OrbitDB `'update'` events is the prime suspect for the §D.5 divergence.** The fix suppresses `handleReplication` callback for `update` entries authored by our own identity, on the assumption that "only peer writes warrant a re-sync". But the same `update` event is also the local state-machine's signal that its own write was durably applied. Filtering it out races the receiver against the snapshot publish, and at edge of timing windows the recipient loses a token.

4. **Net code change is +1500 / −200 across 30 files.** We added more code than we removed. Most "simplifications" were rearrangements (WeakMap layer, AggregateError branch, prefix-collision check, extracted helper). Two genuine wins (single concurrency constant, folded size-probe-into-hash-compute) do not move the needle on a network-bound workload.

The takeaway: **#360 skipped the measurement step that should precede any perf claim**. We should not repeat that.

---

## What the maintainer wants explored next

> on every write (one CID) we apparently doing the whole round-trip, while we could simply batch all these writes into a single shot

This is the hypothesis the next investigation should test. Concretely:

- Every `OrbitDbAdapter.putEntry` call (which is invoked many times per logical operation — `addBundle`, OUTBOX state transitions, SENT-ledger appends, tombstone writes, profile-pointer publishes, etc.) currently triggers its own dag-cbor encode → IPFS pin → HEAD-verify → optional Nostr publish round-trip.
- A logical operation like "Bob pays Alice" emits ~10–50 such writes in close succession.
- If each one is a full HTTP HEAD-verify round-trip against the gateway (`flush-durability` deadline 30 s), the wall-clock cost per logical op is `N_writes × verify_RTT`, not `1 × verify_RTT`.
- Batching writes within a logical operation (or within a debounce window) into a single HEAD-verify could collapse this from O(N) round-trips to O(1).

What "batching" means concretely needs profile data first — see the next section.

---

## Measurement infrastructure (added on `chore/perf-instrumentation`)

A small counters module has been added so the next investigator does not start from scratch:

- `profile/internal/perf-counters.ts` — `incr(name)`, `observe(name, ms)`, `time(name, fn)`, `dumpAndReset()`. Gated by `SPHERE_PERF=1` (no overhead when off). When on, dumps a per-counter snapshot every 5 s via the existing `logger.info('perf', ...)` channel.
- Wired at the hot paths the #360 findings claimed but never measured:
  - `OrbitDbAdapter.putEntry` — count + total time. Resolves the per-write-round-trip question directly.
  - `OrbitDbAdapter.onReplication` callback — fires/sec.
  - `BundleIndex.listActiveBundles` — calls + walk time.
  - `fetchCarFromIpfs` — call count, blocks-per-call, total bytes, wall-clock per call.
  - `pinCarBlocksToIpfs` — block count, total bytes, wall-clock per call.
  - `flushToIpfs` — call count, blocks, wall-clock.
  - `UxfPackage.computeVerifiedProofsAcross` — total proofs verified, verifier calls, wall-clock.

To use:

```bash
SPHERE_PERF=1 SPHERE_DEBUG=perf=info bash .tmp/soak-264.sh
```

Every 5 s the daemon emits a counter snapshot. After the soak you can post-process the log to build flame-shaped tables.

This is **not a fix**. It's the missing measurement step that #360 should have started with.

---

## Suggested next steps

1. Run a soak with `SPHERE_PERF=1`. Identify the top-5 counters by total time. **Do not** propose fixes until that table exists.
2. From the counter data, decide whether the batching hypothesis is borne out — is `putEntry` total-time a meaningful fraction of wall-clock? Is `fetchCarFromIpfs` time dominated by HEAD-verify or by block-fetch? Does `flushToIpfs` time scale linearly with bundle count?
3. Propose at most **one** targeted intervention. Re-run the soak. Compare. Land only if A/B is unambiguously better on both wall-clock **and** assertion pass-rate.

---

## Artefacts

A/B soak artefacts preserved (NOT in git — local fs only):
- `/tmp/soak-metrics/treatment/` — `fix/issue-360-profile-sync-perf` run (805 s, rc=1, §D.5 FAIL).
- `/tmp/soak-metrics/baseline/` — `main` run (631 s, rc=0, all assertions PASS).

Both contain `soak-runs/run-1/script.log`, `totals.tsv`, `samples.tsv`, `summary-metrics.json`, plus the `KEEP=1` workspace tree with daemons' on-disk state for any deeper triage.

The dead-end branch is preserved with a `DO_NOT_USE_DEAD_END.md` marker at the repo root.

cc maintainer for triage / labelling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(profile): post-mortem of #360 — static-analysis fixes regressed; instrumentation added for a real profile-driven investigation #363

Status

What was measured

Correctness regression — §D.5

What went wrong in #360 (post-mortem)

What the maintainer wants explored next

Measurement infrastructure (added on `chore/perf-instrumentation`)

Suggested next steps

Artefacts

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Section	Baseline (`main`)	Treatment (#360 fixes)	Δ
§1 Peer1 setup	55 s	58 s	+3 s
§A.1 Peer2-alice	31 s	31 s	0
§C.1b deliver invoice	6 s	16 s	+10 s (2.67×)
§C.2 Bob pays	62 s	148 s	+86 s (2.39×)
§C.4 cross-device sync	43 s	75 s	+32 s (1.74×)
§D.1 pre-clear snapshots	85 s	127 s	+42 s (1.49×)
§D.3 sphere clear	47 s	50 s	+3 s
§D.4 IPFS-only recovery	219 s	238 s	+19 s
§D.5 Assertions	PASS	FAIL rc=1	—
§E ledger	22 s	not reached	—
TOTAL	631 s	805 s	+174 s (1.28× slower)

Metric	Baseline	Treatment
avg RSS	1.21 GB	1.40 GB (+16 %)
avg CPU%	147 %	163 % (+11 %)
max RSS	2.04 GB	4.74 GB*
max CPU%	333 %	1024 %*

perf(profile): post-mortem of #360 — static-analysis fixes regressed; instrumentation added for a real profile-driven investigation #363

Description

Status

What was measured

Correctness regression — §D.5

What went wrong in #360 (post-mortem)

What the maintainer wants explored next

Measurement infrastructure (added on chore/perf-instrumentation)

Suggested next steps

Artefacts

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Measurement infrastructure (added on `chore/perf-instrumentation`)