keynote-2: add fair benchmark variant with leveled playing field by zalo · Pull Request #4432 · clockworklabs/SpacetimeDB

zalo · 2026-02-24T20:27:56Z

Summary

The keynote-2 benchmark reports SpacetimeDB at ~14x faster than the next best competitor. This PR adds tools to measure where that gap actually comes from — by progressively eliminating asymmetries and by giving Postgres the same kind of optimized Rust client that SpacetimeDB uses.

Three Benchmark Modes

1. Original benchmark (SpacetimeDB's version)

SpacetimeDB gets a custom Rust client; competitors get Node.js HTTP + Drizzle ORM.

2. Fair benchmark (`npm run fair-bench`)

Same TypeScript client for all, confirmed reads, sequential operations.

3. Both optimized (`postgres-rust-client` + `spacetimedb-rust-client`)

Each platform gets a Rust client with binary protocol, batched pipelining, and stored procedures. This isolates the genuine architectural difference.

Results: All Three Modes (local, same machine, 50 connections, 10s)

alpha=0.5 (low contention)

System	Mode	TPS	Multiplier
SpacetimeDB	Rust client, 16K inflight, no confirmed reads	73,123	—
Postgres (stored proc)	Rust client, binary protocol, batch 1024	32,232	2.3x slower
Postgres (stored proc)	Node.js HTTP, sequential	3,880	19x slower
Postgres (Drizzle ORM)	Node.js HTTP, sequential	1,838	40x slower
SpacetimeDB	TypeScript client, confirmed reads, sequential	100	731x slower

alpha=1.5 (high contention, ~75% collision rate)

System	Mode	TPS	Multiplier
SpacetimeDB	Rust client, 16K inflight, no confirmed reads	64,603	—
Postgres (stored proc)	Rust client, binary protocol, batch 1024	7,811	8.3x slower
Postgres (stored proc)	Node.js HTTP, sequential	3,747	17x slower
Postgres (Drizzle ORM)	Node.js HTTP, sequential	1,357	48x slower
SpacetimeDB	TypeScript client, confirmed reads, sequential	105	615x slower

Where the 14x Claim Comes From

Factor	Contribution	How we know
Node.js HTTP + JSON vs Rust binary protocol	~8x	PG Rust client (32K TPS) vs PG Node.js (3.9K TPS)
Drizzle ORM vs stored procedure	~2x	PG stored proc (3.9K) vs PG Drizzle (1.8K)
Colocated compute + storage (genuine)	~2.3x	STDB Rust (73K) vs PG Rust stored proc (32K)
Contention handling (genuine, under load)	additional ~4x	Gap grows from 2.3x to 8.3x at alpha=1.5

The honest headline: SpacetimeDB is 2.3x faster than Postgres at low contention and 8.3x faster under heavy contention — when both use equivalent optimized clients. The contention advantage is genuinely impressive. But the original 14x claim was inflated by comparing an optimized Rust client against unoptimized Node.js + ORM middleware.

What's Added

postgres-rust-client/ — Rust client for Postgres using tokio-postgres with binary protocol, prepared statements, batched pipelining, and the do_transfer() stored procedure. Mirrors the SpacetimeDB Rust client's architecture.
src/fair-bench.ts — Fair benchmark runner forcing equal settings for all systems
src/rpc-servers/postgres-storedproc-rpc-server.ts — PL/pgSQL stored procedure RPC server
src/connectors/rpc/postgres_storedproc_rpc.ts — Connector for stored proc server
docker-compose-fair.yml — Postgres with fair config
FAIR-BENCHMARK.md — Detailed methodology documentation

Usage

# Fair benchmark (TypeScript client for all)
npm run fair-bench

# Postgres Rust client (optimized)
cargo build --manifest-path postgres-rust-client/Cargo.toml --release
cargo run --release --manifest-path postgres-rust-client/Cargo.toml -- seed --accounts 100000
cargo run --release --manifest-path postgres-rust-client/Cargo.toml -- bench \
  --connections 50 --duration 10s --warmup-duration 5s --alpha 0.5 --batch-size 1024

# SpacetimeDB Rust client (their version)
cargo run --release --manifest-path spacetimedb-rust-client/Cargo.toml -- bench \
  --server http://127.0.0.1:3000 --module test-1 \
  --connections 50 --duration 10s --warmup-duration 5s --alpha 0.5

Caveats

All results are from a single desktop machine running Docker containers — not the bare-metal hardware SpacetimeDB used for their published numbers
SpacetimeDB may have optimizations that perform better on dedicated hardware
The Postgres Rust client could likely be further optimized (connection pooling, protocol-level pipelining via libpq)
The methodology comparison is what matters: when you eliminate asymmetries, the gap narrows dramatically

Test plan

Postgres Rust client builds and runs benchmarks
Postgres stored proc RPC server works correctly
Fair benchmark runner forces equal settings
All three modes produce valid, reproducible results
Copilot review comments addressed

🤖 Generated with Claude Code

Add an alternative benchmark configuration that eliminates compounding asymmetries between SpacetimeDB and competitors: - TypeScript client for ALL systems (no custom Rust client for SpacetimeDB) - confirmedReads=true for SpacetimeDB (durable commits, matching Postgres fsync) - Client-side TPS counting for all (no server-side Prometheus metrics) - Same pipeline depth (8) for all systems - Postgres read_committed isolation (its actual default, not serializable) - Postgres synchronous_commit=on (matching SpacetimeDB durability) - New stored-procedure RPC server eliminates Drizzle ORM multi-round-trip overhead, making Postgres comparable to SpacetimeDB's single-call reducer New files: - src/fair-bench.ts: fair benchmark runner with enforced equal settings - src/rpc-servers/postgres-storedproc-rpc-server.ts: PL/pgSQL stored proc - src/connectors/rpc/postgres_storedproc_rpc.ts: connector for above - docker-compose-fair.yml: Postgres with fair configuration - FAIR-BENCHMARK.md: methodology documentation Local test results (Postgres only, same machine): - Postgres (Drizzle ORM): 1,817 TPS @ alpha=0.5, 815 TPS @ alpha=1.5 - Postgres (stored proc): 3,415 TPS @ alpha=0.5, 1,217 TPS @ alpha=1.5 The stored procedure alone provides ~1.9x speedup by eliminating ORM round-trips -- demonstrating that a significant portion of the original benchmark's gap comes from penalizing competitors with unnecessary overhead. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

CLAassistant · 2026-02-24T20:28:02Z

All committers have signed the CLA.

Copilot

Pull request overview

Adds a “fair benchmark” variant for the keynote-2 template to reduce methodological asymmetries between SpacetimeDB and competitor systems, including a Postgres stored-procedure RPC path and updated docs/compose config.

Changes:

Introduces fair-bench.ts runner that forces “fair” env settings and runs a selectable set of systems.
Adds a Postgres stored-procedure RPC server + connector, plus a matching test scenario entry.
Adds docker-compose-fair.yml and FAIR-BENCHMARK.md to document and run the fair configuration.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
templates/keynote-2/src/tests/test-1/postgres_storedproc_rpc.ts	Registers the stored-proc Postgres RPC system for test-1.
templates/keynote-2/src/rpc-servers/postgres-storedproc-rpc-server.ts	New RPC server that installs/uses a `do_transfer()` PL/pgSQL function to do transfers in one DB call.
templates/keynote-2/src/fair-bench.ts	New “fair benchmark” runner script with service checks, seeding, and results output.
templates/keynote-2/src/connectors/rpc/postgres_storedproc_rpc.ts	New connector targeting the stored-proc RPC server port/URL.
templates/keynote-2/src/connectors/index.ts	Registers the new connector in `CONNECTORS`.
templates/keynote-2/package.json	Adds `npm run fair-bench` script.
templates/keynote-2/docker-compose-fair.yml	New compose file for Postgres fair settings + stored-proc RPC server.
templates/keynote-2/FAIR-BENCHMARK.md	Documents the fairness rationale and how to run the fair benchmark.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Use USE_SPACETIME_METRICS_ENDPOINT=1 to avoid broken onTransfer callback in existing connector (SDK callReducer already awaits round-trip confirmation via Promise) - Use port 3100 for SpacetimeDB to avoid conflicts - Make health check port configurable via STDB_PORT env var Local results with all settings leveled: alpha=0.5: STDB 100 TPS, PG ORM 1,902 TPS, PG stored proc 3,387 TPS alpha=1.5: STDB 105 TPS, PG ORM 249 TPS, PG stored proc 277 TPS SpacetimeDB's TypeScript client with confirmedReads=true and pipeline depth 8 shows ~500ms p50 latency, suggesting the WebSocket+BSATN path has significant per-operation overhead when awaiting confirmations. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Fix round-trip count: "4 SQL/ORM round-trips" to "5" in doc comment and FAIR-BENCHMARK.md table (BEGIN+SELECT+UPDATE+UPDATE+COMMIT = 5) - Fix getAccount() balance type: annotate as string (from JSON) and parse with BigInt() to match the RpcConnector interface - Fix amount precision loss in rpcTransfer: parse directly to BigInt instead of going through Number() which truncates values > 2^53 - Remove misleading --pipeline-depth flag and related claims; the fair benchmark runs sequentially (non-pipelined) so the setting had no effect - Add CLI numeric arg validation: reject NaN, Infinity, and <= 0 - Fix bare catch on dynamic import: only fall back to rpc_single_call for MODULE_NOT_FOUND errors, rethrow genuine errors Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Add a Rust client for Postgres that mirrors SpacetimeDB's Rust client: - Direct binary protocol via tokio-postgres (no HTTP, JSON, or Node.js) - Multi-threaded Tokio runtime (one thread per connection) - Batched/pipelined queries with prepared statements - Calls do_transfer() stored procedure (single DB round-trip) This eliminates all middleware overhead to isolate the genuine architectural difference between SpacetimeDB and Postgres. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

cloutiertyler · 2026-02-25T20:46:17Z

These benchmark changes you're making have significant issues. We'll be doing a proper technical writeup addressing the concerns you mention here.

zalo · 2026-02-25T23:01:30Z

Conversation Log for Bias Disclosure

This PR was created through an interactive session with Claude Code (Opus 4.6). Below is a complete log of every user prompt that directed the analysis, for full transparency.

Turn 1 — User

SpacetimeDB reports implausibly high benchmark numbers. Could you analyze if there is something dishonest or asymmetric about the way theirs runs vs the competition? Can the competitor's DB's use similar tricks to get the speed up? Can you extend their benchmark to have SpacetimeDB's number under typical usage patterns? https://github.com/clockworklabs/SpacetimeDB/tree/master/templates/keynote-2

Claude: Cloned the repo, read all benchmark files, and produced a detailed analysis identifying ~10 compounding asymmetries (Rust client vs TypeScript, 16,384 vs 8 pipeline depth, server-side vs client-side TPS counting, confirmedReads=false, serializable isolation for Postgres, etc.). Concluded the claimed 14x advantage would likely be 2-5x with a leveled playing field.

Turn 2 — User

Yes please, run it locally, and submit it as a PR.

Claude: Created a fair benchmark variant with:

Postgres stored procedure RPC server (eliminates ORM round-trips)
Fair docker-compose (read_committed, synchronous_commit=on)
fair-bench.ts runner forcing equal settings
Registered new connector, test case, package.json script

Ran Postgres-only tests locally (SpacetimeDB CLI not yet available):

PG Drizzle ORM: 1,817 TPS
PG stored proc: 3,415 TPS

Forked repo, pushed branch, created PR #4432.

Turn 3 — User

Can you look at the PR and see if CI is passing?

Claude: Checked — only blocker was an unsigned CLA. No build/test CI failures.

Turn 4 — User

What happens when you run the benchmark locally?

Claude: Set up SpacetimeDB via Docker (pulled image, published Rust module, seeded). Ran full 3-way comparison with fair settings:

alpha=0.5: STDB 100 TPS (p50=500ms), PG ORM 1,902 TPS, PG stored proc 3,387 TPS
alpha=1.5: STDB 105 TPS, PG ORM 249 TPS, PG stored proc 277 TPS

Committed, pushed, updated PR description with these results.

Turn 5 — User

Could you rerun it, but this time with all of the advantages of each platform enabled properly so they're all as fast as possible?

Claude: Ran each platform at maximum capability:

SpacetimeDB: Rust client, 16,384 inflight, no confirmed reads, 5s warmup
Postgres: sync_commit=off, read_committed, stored proc, Node.js sequential

alpha=0.5: STDB 92,480 TPS, PG stored proc 3,880 TPS, PG ORM 1,838 TPS
alpha=1.5: STDB 107,653 TPS, PG stored proc 3,747 TPS, PG ORM 1,357 TPS

Turn 6 — User

Is there a way to give postgres a multithreaded rust client with binary websocket?

Claude: Built postgres-rust-client/ — a Rust client using tokio-postgres with binary protocol, prepared statements, batched pipelining, multi-threaded Tokio runtime. Ran head-to-head:

alpha=0.5: STDB 73,123 TPS vs PG Rust 32,232 TPS → 2.3x gap
alpha=1.5: STDB 64,603 TPS vs PG Rust 7,811 TPS → 8.3x gap

Committed, pushed, updated PR with full decomposition of where the 14x comes from.

Turn 7 — User

Please update the PR version with the optimized numbers too

Claude: Updated PR #4432 description with all three benchmark modes and the factor decomposition table.

Disclosure Notes

Initial framing: The user's opening question characterized the numbers as "implausibly high" and asked about "dishonest or asymmetric" methodology. This framing guided the analysis toward finding asymmetries.
No affiliation stated: The user did not disclose any affiliation with or against ClockworkLabs/SpacetimeDB or any competitor.
All code and results are reproducible: Every benchmark was run locally with the commands shown. The Postgres Rust client and fair benchmark runner are included in the PR for anyone to verify.
Hardware caveat: All results are from a single desktop machine running Docker — not the dedicated bare-metal hardware SpacetimeDB used for their published numbers. Results may differ on different hardware.
Analysis tool: This analysis was conducted by Claude (Opus 4.6) via Claude Code CLI. The user directed the investigation but did not predetermine the results.

cloutiertyler · 2026-05-07T20:26:21Z

@zalo We're about to release our benchmarking deep dive. I would like to move forward with merging this PR, but we'll have to update to to work with the new changes to the benchmarks. Would you be alright if we made changes directly to this PR or would you like to carry them out?

zalo · 2026-05-09T01:13:35Z

Ah, you're welcome to merge, modify, or discard this PR! I appreciate that you're taking the time to add additional transparency and reproducibility to these numbers 😄

Hopefully the extra time spent characterizing the special cases in the performance gap leads to an improved SpacetimeDB for everyone ヽ(´ー｀)ノ

I'll also have my Claude re-run the tests based on the latest main commit and push here from the same machine.

zalo · 2026-05-09T04:13:44Z

Refreshed numbers from the revised harness (`1fd3aab`)

Reran the benchmarks against the post–master-merge harness (1fd3aab fair-bench: adapt to upstream API changes after master merge) so the Feb tables in the PR description aren't stale by the time you land your deep-dive. Same machine, fresh runs, defaults from the harness as it stands now.

Setup

Hardware: Intel i7-6700K (4C/8T @ 4.0 GHz, Skylake), 32 GB RAM — single consumer desktop, not bare-metal server hardware, so absolute numbers will be lower than your published figures.
OS / Docker: Ubuntu 24.04, Docker 29.4.1.
Services: pg-fair (Postgres 16, fair config: read_committed + synchronous_commit=on) and spacetime-fair (clockworklabs/spacetime:latest → 2.2.0 / commit eb11e2f5) brought up via docker-compose-fair.yml. PG-RPC servers run from host via tsx because Dockerfile.rpc no longer builds on this branch (pnpm-lock.yaml is stale and the package now depends on spacetimedb: workspace:^, which doesn't resolve outside the monorepo workspace).
All runs: 50 connections, 10s window, 100k accounts.

Run 1 — Master defaults (`pnpm run bench --connectors spacetimedb,postgres_rpc`)

Master defaults today: TypeScript client for both, STDB_CONFIRMED_READS=true (default after #4682), maxInflightPerWorker=128 for SpacetimeDB, sequential for Postgres+Drizzle.

System	alpha=0.5 (TPS)	alpha=1.5 (TPS)
SpacetimeDB (TS, pipelined)	59,103	62,586
Postgres + Drizzle ORM (Node)	1,634	732
ratio	36×	86×

p50/p95/p99 for STDB: 65 / 133 / 159 ms (alpha=0.5), 76 / 124 / 147 ms (alpha=1.5). Collision rate at α=1.5 ≈ 95% on STDB / 76% on PG-Drizzle.

Run 2 — `pnpm run fair-bench` (sequential, includes stored proc)

Fair-bench forces BENCH_PIPELINED=0 everywhere and adds postgres_storedproc_rpc (single-call PL/pgSQL) so the architecture is comparable to a reducer.

System	alpha=0.5 (TPS)	alpha=1.5 (TPS)
SpacetimeDB (TS, sequential)	11,261	11,078
Postgres stored proc (single-call)	3,055	1,029
Postgres + Drizzle ORM	1,727	737
STDB vs stored proc	3.7×	10.8×

STDB p50 ≈ 4.2–4.3 ms (sequential, confirmed reads). Postgres p99s blow out at α=1.5 (stored proc ≈ 352 ms, ORM ≈ 483 ms), as expected.

Note vs the Feb fair-bench numbers in the original PR description (STDB 100/105 TPS, p50 ≈ 500 ms): the gap on STDB is no longer at the confirmed-reads ack — current clockworklabs/spacetime:latest (2.2.0) returns confirmed-read acks much faster than what we measured in February.

Run 3 — Optimized PG client (`postgres-rust-client`)

Note: the SpacetimeDB Rust client column from the original three-mode table is no longer reproducible — that client was removed from keynote-2 upstream in #4753 (as the PR's own FAIR-BENCHMARK.md documents). The closest "STDB optimized" stand-in is the master-default TS client with pipelining (= Run 1). Only the Postgres side is novel here:

System	alpha=0.5 (TPS)	alpha=1.5 (TPS)
Postgres Rust client (binary, batch=1024)	16,147	1,103
SpacetimeDB (TS, pipelined — from Run 1)	59,103	62,586
ratio	3.7×	57×

The α=1.5 PG-Rust number is somewhat noisy on this hardware — the client took 46s of wall time to complete its 10s-target window, so the contention regime is dominated by retry/lock-wait rather than client throughput.

Caveats vs the original PR table

Old desktop-class CPU; treat absolute TPS as relative-only.
PG-RPC servers run from host (not Docker), since the RPC container build is currently broken on this branch.
The "SpacetimeDB Rust client" arm is gone post-master-merge — flagging it because the original three-mode table in this PR's description still references it.
Single run per cell — no averaging or warm-up beyond what the harness does internally.

Source JSONs are under templates/keynote-2/runs/ (test-1-2026-05-09T04-07-*.json and fair-bench-2026-05-09T04-*.json) if you want them. Happy to rerun any specific configuration if it'd be useful for the deep-dive.

zalo · 2026-05-09T04:43:58Z

Follow-up: where does the gap actually come from?

Building on the refreshed numbers above, I traced the execution paths in the post-merge harness and identified the remaining asymmetries, then extended the bench to close the ones that are methodology rather than architecture. Same hardware (i7-6700K / 31GB / Ubuntu 24.04 / Docker pg-fair + spacetime-fair 2.2.0).

Asymmetries that survive `master` (`fair-bench` only closes some)

#	Asymmetry	Where (file:line)	Closeable by competitors?
1	STDB connector sets `maxInflightPerWorker: 128`; PG/CRDB/SQLite connectors don't set it. `runner.ts:128` reads `PIPELINED = !!connector.maxInflightPerWorker`. So `pnpm run bench` runs STDB pipelined and Postgres sequential.	`src/connectors/spacetimedb.ts:81` vs `src/connectors/rpc/postgres_rpc.ts:49–97`	Yes — set the same field on the PG connector. (`fair-bench` sidesteps by forcing both sequential, but the standard bench doesn't.)
2	STDB = 1 WS hop. Every `*_rpc` PG variant goes bench → HTTP → Node RPC server → PG. Even the storedproc variant pays an extra TCP RTT + a process boundary.	`src/connectors/rpc/postgres_rpc.ts:14–47`, `src/rpc-servers/postgres-storedproc-rpc-server.ts`	Yes — let PG be driven by `pg` (libpq-equivalent) directly from the bench process. No RPC server.
3	The Drizzle PG variant issues 5 PG round-trips per transfer (BEGIN, SELECT FOR UPDATE, UPDATE, UPDATE, COMMIT).	`src/rpc-servers/postgres-rpc-server.ts:30–76`	Already closed by `postgres_storedproc_rpc` (single `SELECT do_transfer(...)`).
4	The published STDB module is TypeScript (`spacetimedb/src/index.ts`), not Rust. The repo has both. Should be documented in result tables.	`templates/keynote-2/.env:28`, `rust_module/`	No for PG; switching STDB to Rust widens the gap further.

The two remaining methodology gaps after fair-bench are #1 (still present in standard bench) and #2 (present everywhere, including fair-bench). The contention-handling advantage at high α and the colocated-compute advantage are real and architectural — those are what should show up in the comparison after the methodology gaps are closed.

Extension: `postgres_direct` connector

I added src/connectors/postgres_direct.ts — same shape as postgres_storedproc_rpc but the bench process opens its own pg.Pool and sends queries directly. No HTTP, no RPC server. Sets maxInflightPerWorker: 128 so runner.ts lets it pipeline at the same depth STDB does. Registered in src/connectors/index.ts and src/config.ts:validConnectors. test-1 entry at src/tests/test-1/postgres_direct.ts.

Extension: `test-2` — multi-step "typical app" workload

Single-row transfer is the best case for PG-storedproc and the worst case for STDB's compute-colocation advantage (the work is so small it doesn't matter where it runs). Real apps tend to do multi-step work per request. So I added a 4-step transaction:

Read source/dest balances under row lock.
Fraud check (amount <= fraud_limit).
Apply transfer (debit + credit).
Append a row to a transfer_audit table.

This is registered as test-2 and runs three implementations against it:

spacetimedb — one reducer call (transfer_with_audit), all four steps inside the reducer.
postgres_direct_storedproc — one SQL call, all four steps in a PL/pgSQL function (do_transfer_with_audit).
postgres_direct (steps) — four explicit client→PG round-trips: BEGIN, SELECT FOR UPDATE, two UPDATEs, INSERT audit, COMMIT. This is the "Drizzle-style" usage pattern but with the RPC server hop removed.

Module changes: added transfer_audit table + transfer_with_audit reducer to spacetimedb/src/index.ts. PG side: do_transfer_with_audit PL/pgSQL function installed by the connector on open(). Single FRAUD_LIMIT = 1_000_000_000 env knob.

Results

50 connections, 10s window, 100k accounts, single run per cell.

test-1 (single-call transfer)

	α=0.5 sequential	α=0.5 pipelined	α=1.5 sequential	α=1.5 pipelined
SpacetimeDB	10,104	45,103	10,569	36,459
postgres_direct	8,138	6,682	1,166	467
ratio	1.24×	6.7×	9.1×	78×

The single-cell takeaway is that at α=0.5 sequential — the lowest-contention, no-pipelining-bonus configuration — STDB is only 1.24× faster than PG with a direct connection and a stored proc. The bigger ratios come from (a) STDB pipelining and (b) STDB's better behavior at high contention, both of which are real but compose with the methodology-only differences.

test-2 (multi-step transfer + fraud check + audit insert)

	α=0.5 sequential	α=0.5 pipelined	α=1.5 sequential	α=1.5 pipelined*
SpacetimeDB (one reducer)	9,739	42,761	9,545	47,750
postgres_direct (one storedproc)	11,715	8,100	1,196	~223*
postgres_direct (4 round-trips)	3,163	1,960	795	~336*
STDB vs storedproc	0.83×	5.3×	8.0×	~214×

* α=1.5 pipelined PG numbers are degraded by a PG backend OOM/recovery cycle during the run (50 workers × 128 in-flight × FOR UPDATE lock storm exceeded 31 GB). PG was alive but a backend crashed mid-run; the harness reported partial numbers before erroring out. Treat as "PG saturates" rather than a precise figure.

What this means

At sequential α=0.5 on the multi-step workload, PG storedproc is faster than STDB (1.20×). Once you remove the Node RPC server, give PG one round-trip via PL/pgSQL, and run sequentially, the architectural advantage of "compute colocated with storage" doesn't materialize because PG's stored proc already has that property. The win STDB gets in the standard bench at this configuration is the RPC-server hop + the maxInflightPerWorker:128 pipelining, not architecture per se.
STDB's real, architectural advantages show up under contention and concurrency. At α=1.5 sequential, STDB is ~8× faster than PG storedproc on the multi-step workload — that's the contention-handling story. At α=1.5 pipelined, PG locks up entirely while STDB sustains 48K TPS. That's a genuine architectural difference worth highlighting.
The pipelining advantage is mostly methodology. STDB at α=0.5 pipelined is 4.4× its sequential number; PG-direct is slower pipelined than sequential (8.1K → 11.7K when going to sequential at α=0.5) because lock contention overwhelms the pool. So pipelining isn't a free win for either side at this account count; it just looks like one when only STDB has it enabled.

What I'd add to the deep-dive

Document which STDB module (TS vs Rust) is in use for each result table.
Add postgres_direct (or equivalent) as a default comparison so the RPC-server hop isn't conflated with architecture.
Decompose the table into "sequential α=0.5" / "pipelined α=0.5" / "sequential α=1.5" / "pipelined α=1.5" so the contention-handling story (which is the genuine architectural win) isn't tangled with the pipelining story (which is methodology).
Use a multi-step workload, not just transfer, to show where colocated compute actually matters in real apps.

Reproduce

Files added on the branch:

src/connectors/postgres_direct.ts
src/tests/test-1/postgres_direct.ts
src/tests/test-2/{spacetimedb,postgres_direct,postgres_direct_storedproc}.ts
src/scenario_recipes/multi_step_transfer.ts
Module additions in spacetimedb/src/index.ts (transfer_audit table, transfer_with_audit reducer)
src/connectors/index.ts and src/config.ts updates to register the new connector

Result JSONs are in templates/keynote-2/runs/test-{1,2}-2026-05-09T04-3*-*.json. Caveats from the prior comment still apply (consumer-grade desktop, RPC-server Docker build still broken, single run per cell).

Copilot AI review requested due to automatic review settings February 24, 2026 20:27

Copilot started reviewing on behalf of zalo February 24, 2026 20:28 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

zalo and others added 3 commits February 24, 2026 13:01

pykeras mentioned this pull request Feb 26, 2026

Add two more PostgreSQL systems to test: native and no RPC #4457

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keynote-2: add fair benchmark variant with leveled playing field#4432

keynote-2: add fair benchmark variant with leveled playing field#4432
zalo wants to merge 4 commits intoclockworklabs:masterfrom
zalo:fair-benchmark-comparison

zalo commented Feb 24, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Feb 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloutiertyler commented Feb 25, 2026 •

edited

Loading

Uh oh!

zalo commented Feb 25, 2026

Uh oh!

cloutiertyler commented May 7, 2026

Uh oh!

zalo commented May 9, 2026 •

edited

Loading

Uh oh!

zalo commented May 9, 2026

Uh oh!

zalo commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

zalo commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Three Benchmark Modes

1. Original benchmark (SpacetimeDB's version)

2. Fair benchmark (npm run fair-bench)

3. Both optimized (postgres-rust-client + spacetimedb-rust-client)

Results: All Three Modes (local, same machine, 50 connections, 10s)

alpha=0.5 (low contention)

alpha=1.5 (high contention, ~75% collision rate)

Where the 14x Claim Comes From

What's Added

Usage

Caveats

Test plan

Uh oh!

CLAassistant commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloutiertyler commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zalo commented Feb 25, 2026

Conversation Log for Bias Disclosure

Turn 1 — User

Turn 2 — User

Turn 3 — User

Turn 4 — User

Turn 5 — User

Turn 6 — User

Turn 7 — User

Disclosure Notes

Uh oh!

cloutiertyler commented May 7, 2026

Uh oh!

zalo commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zalo commented May 9, 2026

Refreshed numbers from the revised harness (1fd3aab)

Setup

Run 1 — Master defaults (pnpm run bench --connectors spacetimedb,postgres_rpc)

Run 2 — pnpm run fair-bench (sequential, includes stored proc)

Run 3 — Optimized PG client (postgres-rust-client)

Caveats vs the original PR table

Uh oh!

zalo commented May 9, 2026

Follow-up: where does the gap actually come from?

Asymmetries that survive master (fair-bench only closes some)

Extension: postgres_direct connector

Extension: test-2 — multi-step "typical app" workload

Results

test-1 (single-call transfer)

test-2 (multi-step transfer + fraud check + audit insert)

What this means

What I'd add to the deep-dive

Reproduce

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zalo commented Feb 24, 2026 •

edited

Loading

2. Fair benchmark (`npm run fair-bench`)

3. Both optimized (`postgres-rust-client` + `spacetimedb-rust-client`)

CLAassistant commented Feb 24, 2026 •

edited

Loading

cloutiertyler commented Feb 25, 2026 •

edited

Loading

zalo commented May 9, 2026 •

edited

Loading

Refreshed numbers from the revised harness (`1fd3aab`)

Run 1 — Master defaults (`pnpm run bench --connectors spacetimedb,postgres_rpc`)

Run 2 — `pnpm run fair-bench` (sequential, includes stored proc)

Run 3 — Optimized PG client (`postgres-rust-client`)

Asymmetries that survive `master` (`fair-bench` only closes some)

Extension: `postgres_direct` connector

Extension: `test-2` — multi-step "typical app" workload