Skip to content

keynote-2: add fair benchmark variant with leveled playing field#4432

Open
zalo wants to merge 4 commits intoclockworklabs:masterfrom
zalo:fair-benchmark-comparison
Open

keynote-2: add fair benchmark variant with leveled playing field#4432
zalo wants to merge 4 commits intoclockworklabs:masterfrom
zalo:fair-benchmark-comparison

Conversation

@zalo
Copy link
Copy Markdown

@zalo zalo commented Feb 24, 2026

Summary

The keynote-2 benchmark reports SpacetimeDB at ~14x faster than the next best competitor. This PR adds tools to measure where that gap actually comes from — by progressively eliminating asymmetries and by giving Postgres the same kind of optimized Rust client that SpacetimeDB uses.

Three Benchmark Modes

1. Original benchmark (SpacetimeDB's version)

SpacetimeDB gets a custom Rust client; competitors get Node.js HTTP + Drizzle ORM.

2. Fair benchmark (npm run fair-bench)

Same TypeScript client for all, confirmed reads, sequential operations.

3. Both optimized (postgres-rust-client + spacetimedb-rust-client)

Each platform gets a Rust client with binary protocol, batched pipelining, and stored procedures. This isolates the genuine architectural difference.

Results: All Three Modes (local, same machine, 50 connections, 10s)

alpha=0.5 (low contention)

System Mode TPS Multiplier
SpacetimeDB Rust client, 16K inflight, no confirmed reads 73,123
Postgres (stored proc) Rust client, binary protocol, batch 1024 32,232 2.3x slower
Postgres (stored proc) Node.js HTTP, sequential 3,880 19x slower
Postgres (Drizzle ORM) Node.js HTTP, sequential 1,838 40x slower
SpacetimeDB TypeScript client, confirmed reads, sequential 100 731x slower

alpha=1.5 (high contention, ~75% collision rate)

System Mode TPS Multiplier
SpacetimeDB Rust client, 16K inflight, no confirmed reads 64,603
Postgres (stored proc) Rust client, binary protocol, batch 1024 7,811 8.3x slower
Postgres (stored proc) Node.js HTTP, sequential 3,747 17x slower
Postgres (Drizzle ORM) Node.js HTTP, sequential 1,357 48x slower
SpacetimeDB TypeScript client, confirmed reads, sequential 105 615x slower

Where the 14x Claim Comes From

Factor Contribution How we know
Node.js HTTP + JSON vs Rust binary protocol ~8x PG Rust client (32K TPS) vs PG Node.js (3.9K TPS)
Drizzle ORM vs stored procedure ~2x PG stored proc (3.9K) vs PG Drizzle (1.8K)
Colocated compute + storage (genuine) ~2.3x STDB Rust (73K) vs PG Rust stored proc (32K)
Contention handling (genuine, under load) additional ~4x Gap grows from 2.3x to 8.3x at alpha=1.5

The honest headline: SpacetimeDB is 2.3x faster than Postgres at low contention and 8.3x faster under heavy contention — when both use equivalent optimized clients. The contention advantage is genuinely impressive. But the original 14x claim was inflated by comparing an optimized Rust client against unoptimized Node.js + ORM middleware.

What's Added

  • postgres-rust-client/ — Rust client for Postgres using tokio-postgres with binary protocol, prepared statements, batched pipelining, and the do_transfer() stored procedure. Mirrors the SpacetimeDB Rust client's architecture.
  • src/fair-bench.ts — Fair benchmark runner forcing equal settings for all systems
  • src/rpc-servers/postgres-storedproc-rpc-server.ts — PL/pgSQL stored procedure RPC server
  • src/connectors/rpc/postgres_storedproc_rpc.ts — Connector for stored proc server
  • docker-compose-fair.yml — Postgres with fair config
  • FAIR-BENCHMARK.md — Detailed methodology documentation

Usage

# Fair benchmark (TypeScript client for all)
npm run fair-bench

# Postgres Rust client (optimized)
cargo build --manifest-path postgres-rust-client/Cargo.toml --release
cargo run --release --manifest-path postgres-rust-client/Cargo.toml -- seed --accounts 100000
cargo run --release --manifest-path postgres-rust-client/Cargo.toml -- bench \
  --connections 50 --duration 10s --warmup-duration 5s --alpha 0.5 --batch-size 1024

# SpacetimeDB Rust client (their version)
cargo run --release --manifest-path spacetimedb-rust-client/Cargo.toml -- bench \
  --server http://127.0.0.1:3000 --module test-1 \
  --connections 50 --duration 10s --warmup-duration 5s --alpha 0.5

Caveats

  • All results are from a single desktop machine running Docker containers — not the bare-metal hardware SpacetimeDB used for their published numbers
  • SpacetimeDB may have optimizations that perform better on dedicated hardware
  • The Postgres Rust client could likely be further optimized (connection pooling, protocol-level pipelining via libpq)
  • The methodology comparison is what matters: when you eliminate asymmetries, the gap narrows dramatically

Test plan

  • Postgres Rust client builds and runs benchmarks
  • Postgres stored proc RPC server works correctly
  • Fair benchmark runner forces equal settings
  • All three modes produce valid, reproducible results
  • Copilot review comments addressed

🤖 Generated with Claude Code

Add an alternative benchmark configuration that eliminates compounding
asymmetries between SpacetimeDB and competitors:

- TypeScript client for ALL systems (no custom Rust client for SpacetimeDB)
- confirmedReads=true for SpacetimeDB (durable commits, matching Postgres fsync)
- Client-side TPS counting for all (no server-side Prometheus metrics)
- Same pipeline depth (8) for all systems
- Postgres read_committed isolation (its actual default, not serializable)
- Postgres synchronous_commit=on (matching SpacetimeDB durability)
- New stored-procedure RPC server eliminates Drizzle ORM multi-round-trip
  overhead, making Postgres comparable to SpacetimeDB's single-call reducer

New files:
- src/fair-bench.ts: fair benchmark runner with enforced equal settings
- src/rpc-servers/postgres-storedproc-rpc-server.ts: PL/pgSQL stored proc
- src/connectors/rpc/postgres_storedproc_rpc.ts: connector for above
- docker-compose-fair.yml: Postgres with fair configuration
- FAIR-BENCHMARK.md: methodology documentation

Local test results (Postgres only, same machine):
- Postgres (Drizzle ORM): 1,817 TPS @ alpha=0.5, 815 TPS @ alpha=1.5
- Postgres (stored proc): 3,415 TPS @ alpha=0.5, 1,217 TPS @ alpha=1.5

The stored procedure alone provides ~1.9x speedup by eliminating ORM
round-trips -- demonstrating that a significant portion of the original
benchmark's gap comes from penalizing competitors with unnecessary overhead.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copilot AI review requested due to automatic review settings February 24, 2026 20:27
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 24, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a “fair benchmark” variant for the keynote-2 template to reduce methodological asymmetries between SpacetimeDB and competitor systems, including a Postgres stored-procedure RPC path and updated docs/compose config.

Changes:

  • Introduces fair-bench.ts runner that forces “fair” env settings and runs a selectable set of systems.
  • Adds a Postgres stored-procedure RPC server + connector, plus a matching test scenario entry.
  • Adds docker-compose-fair.yml and FAIR-BENCHMARK.md to document and run the fair configuration.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
templates/keynote-2/src/tests/test-1/postgres_storedproc_rpc.ts Registers the stored-proc Postgres RPC system for test-1.
templates/keynote-2/src/rpc-servers/postgres-storedproc-rpc-server.ts New RPC server that installs/uses a do_transfer() PL/pgSQL function to do transfers in one DB call.
templates/keynote-2/src/fair-bench.ts New “fair benchmark” runner script with service checks, seeding, and results output.
templates/keynote-2/src/connectors/rpc/postgres_storedproc_rpc.ts New connector targeting the stored-proc RPC server port/URL.
templates/keynote-2/src/connectors/index.ts Registers the new connector in CONNECTORS.
templates/keynote-2/package.json Adds npm run fair-bench script.
templates/keynote-2/docker-compose-fair.yml New compose file for Postgres fair settings + stored-proc RPC server.
templates/keynote-2/FAIR-BENCHMARK.md Documents the fairness rationale and how to run the fair benchmark.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread templates/keynote-2/src/rpc-servers/postgres-storedproc-rpc-server.ts Outdated
Comment thread templates/keynote-2/FAIR-BENCHMARK.md Outdated
Comment thread templates/keynote-2/src/connectors/rpc/postgres_storedproc_rpc.ts Outdated
Comment thread templates/keynote-2/src/rpc-servers/postgres-storedproc-rpc-server.ts Outdated
Comment thread templates/keynote-2/src/fair-bench.ts Outdated
Comment thread templates/keynote-2/src/fair-bench.ts Outdated
Comment thread templates/keynote-2/src/fair-bench.ts
zalo and others added 3 commits February 24, 2026 13:01
- Use USE_SPACETIME_METRICS_ENDPOINT=1 to avoid broken onTransfer
  callback in existing connector (SDK callReducer already awaits
  round-trip confirmation via Promise)
- Use port 3100 for SpacetimeDB to avoid conflicts
- Make health check port configurable via STDB_PORT env var

Local results with all settings leveled:
  alpha=0.5: STDB 100 TPS, PG ORM 1,902 TPS, PG stored proc 3,387 TPS
  alpha=1.5: STDB 105 TPS, PG ORM 249 TPS, PG stored proc 277 TPS

SpacetimeDB's TypeScript client with confirmedReads=true and pipeline
depth 8 shows ~500ms p50 latency, suggesting the WebSocket+BSATN
path has significant per-operation overhead when awaiting confirmations.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Fix round-trip count: "4 SQL/ORM round-trips" to "5" in doc comment
  and FAIR-BENCHMARK.md table (BEGIN+SELECT+UPDATE+UPDATE+COMMIT = 5)
- Fix getAccount() balance type: annotate as string (from JSON) and
  parse with BigInt() to match the RpcConnector interface
- Fix amount precision loss in rpcTransfer: parse directly to BigInt
  instead of going through Number() which truncates values > 2^53
- Remove misleading --pipeline-depth flag and related claims; the fair
  benchmark runs sequentially (non-pipelined) so the setting had no
  effect
- Add CLI numeric arg validation: reject NaN, Infinity, and <= 0
- Fix bare catch on dynamic import: only fall back to rpc_single_call
  for MODULE_NOT_FOUND errors, rethrow genuine errors

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Add a Rust client for Postgres that mirrors SpacetimeDB's Rust client:
- Direct binary protocol via tokio-postgres (no HTTP, JSON, or Node.js)
- Multi-threaded Tokio runtime (one thread per connection)
- Batched/pipelined queries with prepared statements
- Calls do_transfer() stored procedure (single DB round-trip)

This eliminates all middleware overhead to isolate the genuine
architectural difference between SpacetimeDB and Postgres.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@cloutiertyler
Copy link
Copy Markdown
Contributor

cloutiertyler commented Feb 25, 2026

These benchmark changes you're making have significant issues. We'll be doing a proper technical writeup addressing the concerns you mention here.

@zalo
Copy link
Copy Markdown
Author

zalo commented Feb 25, 2026

Conversation Log for Bias Disclosure

This PR was created through an interactive session with Claude Code (Opus 4.6). Below is a complete log of every user prompt that directed the analysis, for full transparency.


Turn 1 — User

SpacetimeDB reports implausibly high benchmark numbers. Could you analyze if there is something dishonest or asymmetric about the way theirs runs vs the competition? Can the competitor's DB's use similar tricks to get the speed up? Can you extend their benchmark to have SpacetimeDB's number under typical usage patterns? https://github.com/clockworklabs/SpacetimeDB/tree/master/templates/keynote-2

Claude: Cloned the repo, read all benchmark files, and produced a detailed analysis identifying ~10 compounding asymmetries (Rust client vs TypeScript, 16,384 vs 8 pipeline depth, server-side vs client-side TPS counting, confirmedReads=false, serializable isolation for Postgres, etc.). Concluded the claimed 14x advantage would likely be 2-5x with a leveled playing field.


Turn 2 — User

Yes please, run it locally, and submit it as a PR.

Claude: Created a fair benchmark variant with:

  • Postgres stored procedure RPC server (eliminates ORM round-trips)
  • Fair docker-compose (read_committed, synchronous_commit=on)
  • fair-bench.ts runner forcing equal settings
  • Registered new connector, test case, package.json script

Ran Postgres-only tests locally (SpacetimeDB CLI not yet available):

  • PG Drizzle ORM: 1,817 TPS
  • PG stored proc: 3,415 TPS

Forked repo, pushed branch, created PR #4432.


Turn 3 — User

Can you look at the PR and see if CI is passing?

Claude: Checked — only blocker was an unsigned CLA. No build/test CI failures.


Turn 4 — User

What happens when you run the benchmark locally?

Claude: Set up SpacetimeDB via Docker (pulled image, published Rust module, seeded). Ran full 3-way comparison with fair settings:

alpha=0.5: STDB 100 TPS (p50=500ms), PG ORM 1,902 TPS, PG stored proc 3,387 TPS
alpha=1.5: STDB 105 TPS, PG ORM 249 TPS, PG stored proc 277 TPS

Committed, pushed, updated PR description with these results.


Turn 5 — User

Could you rerun it, but this time with all of the advantages of each platform enabled properly so they're all as fast as possible?

Claude: Ran each platform at maximum capability:

  • SpacetimeDB: Rust client, 16,384 inflight, no confirmed reads, 5s warmup
  • Postgres: sync_commit=off, read_committed, stored proc, Node.js sequential

alpha=0.5: STDB 92,480 TPS, PG stored proc 3,880 TPS, PG ORM 1,838 TPS
alpha=1.5: STDB 107,653 TPS, PG stored proc 3,747 TPS, PG ORM 1,357 TPS


Turn 6 — User

Is there a way to give postgres a multithreaded rust client with binary websocket?

Claude: Built postgres-rust-client/ — a Rust client using tokio-postgres with binary protocol, prepared statements, batched pipelining, multi-threaded Tokio runtime. Ran head-to-head:

alpha=0.5: STDB 73,123 TPS vs PG Rust 32,232 TPS → 2.3x gap
alpha=1.5: STDB 64,603 TPS vs PG Rust 7,811 TPS → 8.3x gap

Committed, pushed, updated PR with full decomposition of where the 14x comes from.


Turn 7 — User

Please update the PR version with the optimized numbers too

Claude: Updated PR #4432 description with all three benchmark modes and the factor decomposition table.


Disclosure Notes

  • Initial framing: The user's opening question characterized the numbers as "implausibly high" and asked about "dishonest or asymmetric" methodology. This framing guided the analysis toward finding asymmetries.
  • No affiliation stated: The user did not disclose any affiliation with or against ClockworkLabs/SpacetimeDB or any competitor.
  • All code and results are reproducible: Every benchmark was run locally with the commands shown. The Postgres Rust client and fair benchmark runner are included in the PR for anyone to verify.
  • Hardware caveat: All results are from a single desktop machine running Docker — not the dedicated bare-metal hardware SpacetimeDB used for their published numbers. Results may differ on different hardware.
  • Analysis tool: This analysis was conducted by Claude (Opus 4.6) via Claude Code CLI. The user directed the investigation but did not predetermine the results.

@cloutiertyler
Copy link
Copy Markdown
Contributor

@zalo We're about to release our benchmarking deep dive. I would like to move forward with merging this PR, but we'll have to update to to work with the new changes to the benchmarks. Would you be alright if we made changes directly to this PR or would you like to carry them out?

@zalo
Copy link
Copy Markdown
Author

zalo commented May 9, 2026

Ah, you're welcome to merge, modify, or discard this PR! I appreciate that you're taking the time to add additional transparency and reproducibility to these numbers 😄

Hopefully the extra time spent characterizing the special cases in the performance gap leads to an improved SpacetimeDB for everyone ヽ(´ー`)ノ

I'll also have my Claude re-run the tests based on the latest main commit and push here from the same machine.

@zalo
Copy link
Copy Markdown
Author

zalo commented May 9, 2026

Refreshed numbers from the revised harness (1fd3aab)

Reran the benchmarks against the post–master-merge harness (1fd3aab fair-bench: adapt to upstream API changes after master merge) so the Feb tables in the PR description aren't stale by the time you land your deep-dive. Same machine, fresh runs, defaults from the harness as it stands now.

Setup

  • Hardware: Intel i7-6700K (4C/8T @ 4.0 GHz, Skylake), 32 GB RAM — single consumer desktop, not bare-metal server hardware, so absolute numbers will be lower than your published figures.
  • OS / Docker: Ubuntu 24.04, Docker 29.4.1.
  • Services: pg-fair (Postgres 16, fair config: read_committed + synchronous_commit=on) and spacetime-fair (clockworklabs/spacetime:latest2.2.0 / commit eb11e2f5) brought up via docker-compose-fair.yml. PG-RPC servers run from host via tsx because Dockerfile.rpc no longer builds on this branch (pnpm-lock.yaml is stale and the package now depends on spacetimedb: workspace:^, which doesn't resolve outside the monorepo workspace).
  • All runs: 50 connections, 10s window, 100k accounts.

Run 1 — Master defaults (pnpm run bench --connectors spacetimedb,postgres_rpc)

Master defaults today: TypeScript client for both, STDB_CONFIRMED_READS=true (default after #4682), maxInflightPerWorker=128 for SpacetimeDB, sequential for Postgres+Drizzle.

System alpha=0.5 (TPS) alpha=1.5 (TPS)
SpacetimeDB (TS, pipelined) 59,103 62,586
Postgres + Drizzle ORM (Node) 1,634 732
ratio 36× 86×

p50/p95/p99 for STDB: 65 / 133 / 159 ms (alpha=0.5), 76 / 124 / 147 ms (alpha=1.5). Collision rate at α=1.5 ≈ 95% on STDB / 76% on PG-Drizzle.

Run 2 — pnpm run fair-bench (sequential, includes stored proc)

Fair-bench forces BENCH_PIPELINED=0 everywhere and adds postgres_storedproc_rpc (single-call PL/pgSQL) so the architecture is comparable to a reducer.

System alpha=0.5 (TPS) alpha=1.5 (TPS)
SpacetimeDB (TS, sequential) 11,261 11,078
Postgres stored proc (single-call) 3,055 1,029
Postgres + Drizzle ORM 1,727 737
STDB vs stored proc 3.7× 10.8×

STDB p50 ≈ 4.2–4.3 ms (sequential, confirmed reads). Postgres p99s blow out at α=1.5 (stored proc ≈ 352 ms, ORM ≈ 483 ms), as expected.

Note vs the Feb fair-bench numbers in the original PR description (STDB 100/105 TPS, p50 ≈ 500 ms): the gap on STDB is no longer at the confirmed-reads ack — current clockworklabs/spacetime:latest (2.2.0) returns confirmed-read acks much faster than what we measured in February.

Run 3 — Optimized PG client (postgres-rust-client)

Note: the SpacetimeDB Rust client column from the original three-mode table is no longer reproducible — that client was removed from keynote-2 upstream in #4753 (as the PR's own FAIR-BENCHMARK.md documents). The closest "STDB optimized" stand-in is the master-default TS client with pipelining (= Run 1). Only the Postgres side is novel here:

System alpha=0.5 (TPS) alpha=1.5 (TPS)
Postgres Rust client (binary, batch=1024) 16,147 1,103
SpacetimeDB (TS, pipelined — from Run 1) 59,103 62,586
ratio 3.7× 57×

The α=1.5 PG-Rust number is somewhat noisy on this hardware — the client took 46s of wall time to complete its 10s-target window, so the contention regime is dominated by retry/lock-wait rather than client throughput.

Caveats vs the original PR table

  1. Old desktop-class CPU; treat absolute TPS as relative-only.
  2. PG-RPC servers run from host (not Docker), since the RPC container build is currently broken on this branch.
  3. The "SpacetimeDB Rust client" arm is gone post-master-merge — flagging it because the original three-mode table in this PR's description still references it.
  4. Single run per cell — no averaging or warm-up beyond what the harness does internally.

Source JSONs are under templates/keynote-2/runs/ (test-1-2026-05-09T04-07-*.json and fair-bench-2026-05-09T04-*.json) if you want them. Happy to rerun any specific configuration if it'd be useful for the deep-dive.

@zalo
Copy link
Copy Markdown
Author

zalo commented May 9, 2026

Follow-up: where does the gap actually come from?

Building on the refreshed numbers above, I traced the execution paths in the post-merge harness and identified the remaining asymmetries, then extended the bench to close the ones that are methodology rather than architecture. Same hardware (i7-6700K / 31GB / Ubuntu 24.04 / Docker pg-fair + spacetime-fair 2.2.0).

Asymmetries that survive master (fair-bench only closes some)

# Asymmetry Where (file:line) Closeable by competitors?
1 STDB connector sets maxInflightPerWorker: 128; PG/CRDB/SQLite connectors don't set it. runner.ts:128 reads PIPELINED = !!connector.maxInflightPerWorker. So pnpm run bench runs STDB pipelined and Postgres sequential. src/connectors/spacetimedb.ts:81 vs src/connectors/rpc/postgres_rpc.ts:49–97 Yes — set the same field on the PG connector. (fair-bench sidesteps by forcing both sequential, but the standard bench doesn't.)
2 STDB = 1 WS hop. Every *_rpc PG variant goes bench → HTTP → Node RPC server → PG. Even the storedproc variant pays an extra TCP RTT + a process boundary. src/connectors/rpc/postgres_rpc.ts:14–47, src/rpc-servers/postgres-storedproc-rpc-server.ts Yes — let PG be driven by pg (libpq-equivalent) directly from the bench process. No RPC server.
3 The Drizzle PG variant issues 5 PG round-trips per transfer (BEGIN, SELECT FOR UPDATE, UPDATE, UPDATE, COMMIT). src/rpc-servers/postgres-rpc-server.ts:30–76 Already closed by postgres_storedproc_rpc (single SELECT do_transfer(...)).
4 The published STDB module is TypeScript (spacetimedb/src/index.ts), not Rust. The repo has both. Should be documented in result tables. templates/keynote-2/.env:28, rust_module/ No for PG; switching STDB to Rust widens the gap further.

The two remaining methodology gaps after fair-bench are #1 (still present in standard bench) and #2 (present everywhere, including fair-bench). The contention-handling advantage at high α and the colocated-compute advantage are real and architectural — those are what should show up in the comparison after the methodology gaps are closed.

Extension: postgres_direct connector

I added src/connectors/postgres_direct.ts — same shape as postgres_storedproc_rpc but the bench process opens its own pg.Pool and sends queries directly. No HTTP, no RPC server. Sets maxInflightPerWorker: 128 so runner.ts lets it pipeline at the same depth STDB does. Registered in src/connectors/index.ts and src/config.ts:validConnectors. test-1 entry at src/tests/test-1/postgres_direct.ts.

Extension: test-2 — multi-step "typical app" workload

Single-row transfer is the best case for PG-storedproc and the worst case for STDB's compute-colocation advantage (the work is so small it doesn't matter where it runs). Real apps tend to do multi-step work per request. So I added a 4-step transaction:

  1. Read source/dest balances under row lock.
  2. Fraud check (amount <= fraud_limit).
  3. Apply transfer (debit + credit).
  4. Append a row to a transfer_audit table.

This is registered as test-2 and runs three implementations against it:

  • spacetimedb — one reducer call (transfer_with_audit), all four steps inside the reducer.
  • postgres_direct_storedproc — one SQL call, all four steps in a PL/pgSQL function (do_transfer_with_audit).
  • postgres_direct (steps) — four explicit client→PG round-trips: BEGIN, SELECT FOR UPDATE, two UPDATEs, INSERT audit, COMMIT. This is the "Drizzle-style" usage pattern but with the RPC server hop removed.

Module changes: added transfer_audit table + transfer_with_audit reducer to spacetimedb/src/index.ts. PG side: do_transfer_with_audit PL/pgSQL function installed by the connector on open(). Single FRAUD_LIMIT = 1_000_000_000 env knob.

Results

50 connections, 10s window, 100k accounts, single run per cell.

test-1 (single-call transfer)

α=0.5 sequential α=0.5 pipelined α=1.5 sequential α=1.5 pipelined
SpacetimeDB 10,104 45,103 10,569 36,459
postgres_direct 8,138 6,682 1,166 467
ratio 1.24× 6.7× 9.1× 78×

The single-cell takeaway is that at α=0.5 sequential — the lowest-contention, no-pipelining-bonus configuration — STDB is only 1.24× faster than PG with a direct connection and a stored proc. The bigger ratios come from (a) STDB pipelining and (b) STDB's better behavior at high contention, both of which are real but compose with the methodology-only differences.

test-2 (multi-step transfer + fraud check + audit insert)

α=0.5 sequential α=0.5 pipelined α=1.5 sequential α=1.5 pipelined*
SpacetimeDB (one reducer) 9,739 42,761 9,545 47,750
postgres_direct (one storedproc) 11,715 8,100 1,196 ~223*
postgres_direct (4 round-trips) 3,163 1,960 795 ~336*
STDB vs storedproc 0.83× 5.3× 8.0× ~214×

* α=1.5 pipelined PG numbers are degraded by a PG backend OOM/recovery cycle during the run (50 workers × 128 in-flight × FOR UPDATE lock storm exceeded 31 GB). PG was alive but a backend crashed mid-run; the harness reported partial numbers before erroring out. Treat as "PG saturates" rather than a precise figure.

What this means

  • At sequential α=0.5 on the multi-step workload, PG storedproc is faster than STDB (1.20×). Once you remove the Node RPC server, give PG one round-trip via PL/pgSQL, and run sequentially, the architectural advantage of "compute colocated with storage" doesn't materialize because PG's stored proc already has that property. The win STDB gets in the standard bench at this configuration is the RPC-server hop + the maxInflightPerWorker:128 pipelining, not architecture per se.
  • STDB's real, architectural advantages show up under contention and concurrency. At α=1.5 sequential, STDB is ~8× faster than PG storedproc on the multi-step workload — that's the contention-handling story. At α=1.5 pipelined, PG locks up entirely while STDB sustains 48K TPS. That's a genuine architectural difference worth highlighting.
  • The pipelining advantage is mostly methodology. STDB at α=0.5 pipelined is 4.4× its sequential number; PG-direct is slower pipelined than sequential (8.1K → 11.7K when going to sequential at α=0.5) because lock contention overwhelms the pool. So pipelining isn't a free win for either side at this account count; it just looks like one when only STDB has it enabled.

What I'd add to the deep-dive

  1. Document which STDB module (TS vs Rust) is in use for each result table.
  2. Add postgres_direct (or equivalent) as a default comparison so the RPC-server hop isn't conflated with architecture.
  3. Decompose the table into "sequential α=0.5" / "pipelined α=0.5" / "sequential α=1.5" / "pipelined α=1.5" so the contention-handling story (which is the genuine architectural win) isn't tangled with the pipelining story (which is methodology).
  4. Use a multi-step workload, not just transfer, to show where colocated compute actually matters in real apps.

Reproduce

Files added on the branch:

  • src/connectors/postgres_direct.ts
  • src/tests/test-1/postgres_direct.ts
  • src/tests/test-2/{spacetimedb,postgres_direct,postgres_direct_storedproc}.ts
  • src/scenario_recipes/multi_step_transfer.ts
  • Module additions in spacetimedb/src/index.ts (transfer_audit table, transfer_with_audit reducer)
  • src/connectors/index.ts and src/config.ts updates to register the new connector

Result JSONs are in templates/keynote-2/runs/test-{1,2}-2026-05-09T04-3*-*.json. Caveats from the prior comment still apply (consumer-grade desktop, RPC-server Docker build still broken, single run per cell).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants