You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#360 has been closed as a failed attempt. The 7-finding stack on fix/issue-360-profile-sync-perf (commits b8ba5fc → a8094e8) was implemented in good faith and then A/B-soak-measured against main. The measurement was unambiguous: the "perf fixes" make things worse and introduce a correctness regression.
This issue exists so the next investigator can pick up cleanly with:
The hypothesis the maintainer wants explored next (batching writes).
The measurement infrastructure that has been added to make the next attempt profile-driven rather than static-analysis-driven.
What was measured
A/B soak on 2026-05-30 using .tmp/soak-264.sh ITERATIONS=1, with SPHERE_DEBUG=* and a 5-second top-style sampler running alongside.
Both runs reach B ≈ 70 active bundles — the same regime #360 cited as worst-case. So the regression cannot be dismissed as "the soak doesn't exercise the targeted scale".
Resource (trimmed to active soak window, 30 s start-skip):
Metric
Baseline
Treatment
avg RSS
1.21 GB
1.40 GB (+16 %)
avg CPU%
147 %
163 % (+11 %)
max RSS
2.04 GB
4.74 GB*
max CPU%
333 %
1024 %*
*Treatment max RSS / CPU peak is a parallel tsup DTS build of the main baseline worktree that ran during the first ~30 s of the treatment window — not the SDK itself. Discount when comparing peaks.
Correctness regression — §D.5
Treatment failed §D.5 with a real cross-device divergence on Bob's UCT balance after recovery:
The original issue body looked like a rigorous perf analysis. It is not. It is static analysis of source code with cost models derived by multiplying source-code counts. There is no flamegraph, no node --prof output, no perf record, no allocation sampler, no actual CPU% trace from the symptom window. The "525k Map allocations per load" and "6.5 minutes synchronous network IO" numbers are computations on the structure of the code, not on what the code does at runtime.
Implementing those findings produced the measured regression because:
The pathologies called out were not the actual limiter on the workload. The dominant sections (§C.2 send, §D.1 pre-clear, §D.4 recovery) are network-bound (Nostr round-trips, aggregator HEAD-verify, IPFS gateway propagation), not CPU-bound. Local CPU optimisation cannot shorten a network-bound wait — but it can lengthen it if the optimisation triggers rate-limiting.
Parallel fan-out (Findings DMs not working #2, Feature/nametag enhancements #5, cap=8) likely triggers testnet rate-limiting at B ≈ 70. The pre-fix sequential dispatch may have been pessimal-looking but rate-limit-safe. Going from 1 in-flight → 8 in-flight against a shared aggregator + Kubo gateway turns "slow but steady" into "throttled and worse".
Finding Sphere SDK #1's identity-filter on OrbitDB 'update' events is the prime suspect for the §D.5 divergence. The fix suppresses handleReplication callback for update entries authored by our own identity, on the assumption that "only peer writes warrant a re-sync". But the same update event is also the local state-machine's signal that its own write was durably applied. Filtering it out races the receiver against the snapshot publish, and at edge of timing windows the recipient loses a token.
Net code change is +1500 / −200 across 30 files. We added more code than we removed. Most "simplifications" were rearrangements (WeakMap layer, AggregateError branch, prefix-collision check, extracted helper). Two genuine wins (single concurrency constant, folded size-probe-into-hash-compute) do not move the needle on a network-bound workload.
The takeaway: #360 skipped the measurement step that should precede any perf claim. We should not repeat that.
What the maintainer wants explored next
on every write (one CID) we apparently doing the whole round-trip, while we could simply batch all these writes into a single shot
This is the hypothesis the next investigation should test. Concretely:
Every OrbitDbAdapter.putEntry call (which is invoked many times per logical operation — addBundle, OUTBOX state transitions, SENT-ledger appends, tombstone writes, profile-pointer publishes, etc.) currently triggers its own dag-cbor encode → IPFS pin → HEAD-verify → optional Nostr publish round-trip.
A logical operation like "Bob pays Alice" emits ~10–50 such writes in close succession.
If each one is a full HTTP HEAD-verify round-trip against the gateway (flush-durability deadline 30 s), the wall-clock cost per logical op is N_writes × verify_RTT, not 1 × verify_RTT.
Batching writes within a logical operation (or within a debounce window) into a single HEAD-verify could collapse this from O(N) round-trips to O(1).
What "batching" means concretely needs profile data first — see the next section.
Measurement infrastructure (added on chore/perf-instrumentation)
A small counters module has been added so the next investigator does not start from scratch:
profile/internal/perf-counters.ts — incr(name), observe(name, ms), time(name, fn), dumpAndReset(). Gated by SPHERE_PERF=1 (no overhead when off). When on, dumps a per-counter snapshot every 5 s via the existing logger.info('perf', ...) channel.
Every 5 s the daemon emits a counter snapshot. After the soak you can post-process the log to build flame-shaped tables.
This is not a fix. It's the missing measurement step that #360 should have started with.
Suggested next steps
Run a soak with SPHERE_PERF=1. Identify the top-5 counters by total time. Do not propose fixes until that table exists.
From the counter data, decide whether the batching hypothesis is borne out — is putEntry total-time a meaningful fraction of wall-clock? Is fetchCarFromIpfs time dominated by HEAD-verify or by block-fetch? Does flushToIpfs time scale linearly with bundle count?
Propose at most one targeted intervention. Re-run the soak. Compare. Land only if A/B is unambiguously better on both wall-clock and assertion pass-rate.
Artefacts
A/B soak artefacts preserved (NOT in git — local fs only):
/tmp/soak-metrics/treatment/ — fix/issue-360-profile-sync-perf run (805 s, rc=1, §D.5 FAIL).
/tmp/soak-metrics/baseline/ — main run (631 s, rc=0, all assertions PASS).
Both contain soak-runs/run-1/script.log, totals.tsv, samples.tsv, summary-metrics.json, plus the KEEP=1 workspace tree with daemons' on-disk state for any deeper triage.
The dead-end branch is preserved with a DO_NOT_USE_DEAD_END.md marker at the repo root.
Status
This issue replaces #360.
#360 has been closed as a failed attempt. The 7-finding stack on
fix/issue-360-profile-sync-perf(commits b8ba5fc → a8094e8) was implemented in good faith and then A/B-soak-measured againstmain. The measurement was unambiguous: the "perf fixes" make things worse and introduce a correctness regression.This issue exists so the next investigator can pick up cleanly with:
What was measured
A/B soak on 2026-05-30 using
.tmp/soak-264.sh ITERATIONS=1, withSPHERE_DEBUG=*and a 5-secondtop-style sampler running alongside.Both runs reach B ≈ 70 active bundles — the same regime #360 cited as worst-case. So the regression cannot be dismissed as "the soak doesn't exercise the targeted scale".
main)Resource (trimmed to active soak window, 30 s start-skip):
*Treatment max RSS / CPU peak is a parallel
tsupDTS build of themainbaseline worktree that ran during the first ~30 s of the treatment window — not the SDK itself. Discount when comparing peaks.Correctness regression — §D.5
Treatment failed §D.5 with a real cross-device divergence on Bob's UCT balance after recovery:
Baseline ran the same flow with identical inputs and all 6 assertions passed.
What went wrong in #360 (post-mortem)
The original issue body looked like a rigorous perf analysis. It is not. It is static analysis of source code with cost models derived by multiplying source-code counts. There is no flamegraph, no
node --profoutput, noperf record, no allocation sampler, no actual CPU% trace from the symptom window. The "525k Map allocations per load" and "6.5 minutes synchronous network IO" numbers are computations on the structure of the code, not on what the code does at runtime.Implementing those findings produced the measured regression because:
The pathologies called out were not the actual limiter on the workload. The dominant sections (§C.2 send, §D.1 pre-clear, §D.4 recovery) are network-bound (Nostr round-trips, aggregator HEAD-verify, IPFS gateway propagation), not CPU-bound. Local CPU optimisation cannot shorten a network-bound wait — but it can lengthen it if the optimisation triggers rate-limiting.
Parallel fan-out (Findings DMs not working #2, Feature/nametag enhancements #5, cap=8) likely triggers testnet rate-limiting at B ≈ 70. The pre-fix sequential dispatch may have been pessimal-looking but rate-limit-safe. Going from 1 in-flight → 8 in-flight against a shared aggregator + Kubo gateway turns "slow but steady" into "throttled and worse".
Finding Sphere SDK #1's identity-filter on OrbitDB
'update'events is the prime suspect for the §D.5 divergence. The fix suppresseshandleReplicationcallback forupdateentries authored by our own identity, on the assumption that "only peer writes warrant a re-sync". But the sameupdateevent is also the local state-machine's signal that its own write was durably applied. Filtering it out races the receiver against the snapshot publish, and at edge of timing windows the recipient loses a token.Net code change is +1500 / −200 across 30 files. We added more code than we removed. Most "simplifications" were rearrangements (WeakMap layer, AggregateError branch, prefix-collision check, extracted helper). Two genuine wins (single concurrency constant, folded size-probe-into-hash-compute) do not move the needle on a network-bound workload.
The takeaway: #360 skipped the measurement step that should precede any perf claim. We should not repeat that.
What the maintainer wants explored next
This is the hypothesis the next investigation should test. Concretely:
OrbitDbAdapter.putEntrycall (which is invoked many times per logical operation —addBundle, OUTBOX state transitions, SENT-ledger appends, tombstone writes, profile-pointer publishes, etc.) currently triggers its own dag-cbor encode → IPFS pin → HEAD-verify → optional Nostr publish round-trip.flush-durabilitydeadline 30 s), the wall-clock cost per logical op isN_writes × verify_RTT, not1 × verify_RTT.What "batching" means concretely needs profile data first — see the next section.
Measurement infrastructure (added on
chore/perf-instrumentation)A small counters module has been added so the next investigator does not start from scratch:
profile/internal/perf-counters.ts—incr(name),observe(name, ms),time(name, fn),dumpAndReset(). Gated bySPHERE_PERF=1(no overhead when off). When on, dumps a per-counter snapshot every 5 s via the existinglogger.info('perf', ...)channel.OrbitDbAdapter.putEntry— count + total time. Resolves the per-write-round-trip question directly.OrbitDbAdapter.onReplicationcallback — fires/sec.BundleIndex.listActiveBundles— calls + walk time.fetchCarFromIpfs— call count, blocks-per-call, total bytes, wall-clock per call.pinCarBlocksToIpfs— block count, total bytes, wall-clock per call.flushToIpfs— call count, blocks, wall-clock.UxfPackage.computeVerifiedProofsAcross— total proofs verified, verifier calls, wall-clock.To use:
Every 5 s the daemon emits a counter snapshot. After the soak you can post-process the log to build flame-shaped tables.
This is not a fix. It's the missing measurement step that #360 should have started with.
Suggested next steps
SPHERE_PERF=1. Identify the top-5 counters by total time. Do not propose fixes until that table exists.putEntrytotal-time a meaningful fraction of wall-clock? IsfetchCarFromIpfstime dominated by HEAD-verify or by block-fetch? DoesflushToIpfstime scale linearly with bundle count?Artefacts
A/B soak artefacts preserved (NOT in git — local fs only):
/tmp/soak-metrics/treatment/—fix/issue-360-profile-sync-perfrun (805 s, rc=1, §D.5 FAIL)./tmp/soak-metrics/baseline/—mainrun (631 s, rc=0, all assertions PASS).Both contain
soak-runs/run-1/script.log,totals.tsv,samples.tsv,summary-metrics.json, plus theKEEP=1workspace tree with daemons' on-disk state for any deeper triage.The dead-end branch is preserved with a
DO_NOT_USE_DEAD_END.mdmarker at the repo root.cc maintainer for triage / labelling.