Skip to content

feat(sea): add per-file compression to SEA archive (closes #250)#251

Merged
robertsLando merged 8 commits intomainfrom
feat/sea-compression
Apr 18, 2026
Merged

feat(sea): add per-file compression to SEA archive (closes #250)#251
robertsLando merged 8 commits intomainfrom
feat/sea-compression

Conversation

@robertsLando
Copy link
Copy Markdown
Member

Summary

Ports per-stripe compression to enhanced SEA mode — the SEA archive now supports --compress Brotli, --compress GZip, and --compress Zstd alongside Standard mode. Files are compressed independently at build time and decompressed lazily at first fs.readFileSync() / require(), so the cold-start cost scales with files actually read, not archive size.

Closes #250.

Measured impact (claude-code@1.0.100, node22-linux-x64)

Build Binary Cold start
pkg --sea 194 MB ~560 ms
pkg --sea --compress GZip 154 MB ~580 ms
pkg --sea --compress Zstd 152 MB ~570 ms
pkg --sea --compress Brotli 147 MB ~600 ms

Zstd is the recommended default — near-Brotli size at GZip-class build speed. Brotli goes further on size but takes ~3 min to compress the archive for this workload. All three have cold-start overhead within measurement noise (±10 ms best-of-5).

Implementation

  • lib/compress_type.ts — extend CompressType with Zstd = 3.
  • lib/index.ts — accept "zstd" / "zs" at --compress; refuse it for simple SEA mode (no walker, nothing to compress).
  • lib/producer.ts — wire createZstdCompress() into Standard-mode producer too, so the flag is consistent across modes.
  • lib/sea-assets.ts — compress each entry when writing the archive; store manifest.compression (numeric CompressType) at the manifest root; keep stats[key].size as the uncompressed length so fs.statSync() reports real file sizes. Absent compression field = uncompressed archive (backward compat with pre-SEA mode: add per-stripe compression (~75 MB win on typical apps) #250 SEA binaries).
  • lib/sea.ts, lib/types.ts — thread doCompress through seaEnhanced().
  • prelude/bootstrap.js — Zstd branch in payloadFile / payloadFileSync for Standard mode.
  • prelude/sea-vfs-setup.js — pick a sync decompressor once at SEAProvider construction time; decompress on first read, cache the result in the existing _fileCache. Clear error if a Zstd-packaged binary runs on a Node without zlib.zstdDecompressSync.
  • Zstd requires Node.js >= 22.15 on both the build host and the packaged runtime. Guarded with an actionable error in both paths. GZip and Brotli work on every supported Node.

Test plan

  • New test/test-93-sea-compress builds the same fixture with None / GZip / Brotli / Zstd (skipping Zstd when the test runtime lacks zstdCompressSync) and asserts every packaged binary prints identical output.
  • Existing test-85-sea-enhanced and test-86-sea-assets still pass on Node 22.22.1.
  • yarn build and yarn lint both clean.
  • Real-world smoke test with @anthropic-ai/claude-code@1.0.100 — built and ran one binary per codec.
  • CI on full target matrix.

Docs

  • docs-site/guide/compression.md — remove "not supported in SEA mode" warning, add Zstd + SEA section.
  • docs-site/guide/sea-mode.md — stop claiming SEA "skips compression."
  • docs-site/guide/sea-vs-standard.md — compression row now ✅/✅.
  • docs-site/guide/vs-bun-deno.md — add --compress rows to the claude-code case study with the numbers above.
  • docs/ARCHITECTURE.md — update the performance-comparison row for SEA.

robertsLando and others added 5 commits April 18, 2026 11:22
…/GZip/Zstd)

Extends the existing --compress flag to enhanced SEA mode, matching what
Standard mode has had for years.  Each file in the SEA archive is compressed
independently with gzip / brotli / zstd and decompressed lazily at first
fs.readFileSync() / require(), so the cold-start cost is proportional to the
files actually read — not the full archive.

Measured on claude-code@1.0.100 (node22-linux-x64): 194 MB → 152 MB with
--compress Zstd (41 MB saved, no measurable startup regression), and
194 MB → 147 MB with --compress Brotli (~3 min build).  Closes most of the
size gap between SEA-mode binaries and competitors like Bun.

- lib/compress_type.ts: add Zstd = 3
- lib/index.ts: accept "Zstd"/"zs" at --compress; refuse --compress for
  simple SEA mode (no walker → nothing to compress)
- lib/producer.ts: wire Zstd compressor into Standard-mode producer too,
  so the flag is consistent across modes
- lib/sea-assets.ts: compress each entry during archive write; record
  manifest.compression = numeric CompressType; keep stats[key].size as
  the uncompressed length so fs.statSync() reports the real file size
- lib/sea.ts, lib/types.ts: thread doCompress through seaEnhanced()
- prelude/bootstrap.js: add Zstd branch to payloadFile/payloadFileSync
- prelude/sea-vfs-setup.js: pick a decompressor once at SEAProvider
  construction; decompress on first read, cache the result in _fileCache
- test/test-93-sea-compress: build the same fixture with None/GZip/
  Brotli/Zstd (Zstd gated on zlib.zstdCompressSync availability) and
  assert every packaged binary prints identical output
- docs: update compression.md, sea-mode.md, sea-vs-standard.md,
  ARCHITECTURE.md, and vs-bun-deno.md with the new feature and the
  re-measured claude-code numbers

Closes #250
…k SEA binaries

Binary-size gap to Bun isn't all archive — ~30 MB of the remaining delta
is full-ICU in the stock Node binary pkg-fetch ships. Spell out that
./configure --without-intl --without-inspector --without-npm --without-corepack
--fully-static (a pkg-fetch concern, not a pkg one) would close most of
what's left.
…ents

Re-ran all four pkg --sea variants on the same host with consistent
methodology (first-run, cold ~/.cache/pkg, /usr/bin/time -f %e for
./binary --version).  Bun/Deno rows are unchanged from the morning run.

- None:   979 → 610 ms
- GZip:         590 ms (new)
- Zstd:         560 ms (new)
- Brotli:       590 ms (new)

Compression adds ≤0 ms vs uncompressed on this workload because
claude-code's --version path only touches a handful of files, so the
sync zlib/zstd decode cost is dwarfed by the startup savings from the
smaller archive being memory-mapped.
Ran pkg --sea (4 codecs), bun --compile, bun --compile --bytecode, and
deno compile on the same host with matching methodology (fresh fixture,
cold ~/.cache/pkg, /usr/bin/time -f %e for ./bin --version first run):

  Bun                510 ms (108 MB)
  Bun --bytecode     530 ms (190 MB)
  pkg --sea          560 ms (194 MB)
  pkg --sea --zstd   570 ms (152 MB)
  pkg --sea --gzip   580 ms (154 MB)
  pkg --sea --brotli 590 ms (147 MB)
  Deno               740 ms (183 MB)

The previous numbers (797 Bun / 1256 Deno / 979 pkg) were measured on a
different run/method, not apples-to-apples.  These six are.  Bun is still
fastest and smallest; pkg SEA with compression is within ~60 ms of Bun
while shipping stock Node.js; Deno is the slowest starter on this
workload.  Narrative paragraphs updated to match.
…streaming

Security / correctness:
- prelude/sea-vfs-setup.js: cap per-file decompression via maxOutputLength and
  assert decompressed length matches manifest stats.size; use Number.isInteger
  for offset/length/size bounds (rejects NaN and non-integer floats that the
  prior typeof-number guard let through).
- lib/sea-assets.ts: synthesize a stats entry for records that had STORE_CONTENT
  but no STORE_STAT, so every compressed stripe has an authoritative size for
  the runtime to cross-check against. Make resolveCompressor exhaustive — a new
  CompressType without a matching case now fails the build instead of shipping
  an archive that claims compression but contains raw bytes.

Performance:
- lib/sea-assets.ts: restore createReadStream path for unmodified disk-resident
  files; the prior always-readFileAsync forced peak RSS to grow with total
  asset size even when compression was disabled.
- Resolve the decompressor/compressor exactly once per path: at module load in
  prelude/bootstrap.js, at SEAProvider construction in sea-vfs-setup.js, before
  the stripe loop in sea-assets.ts, and before Multistream in producer.ts. Fails
  fast when the runtime is missing a Zstd API instead of mid-stripe. Skips
  _fileCache entirely for uncompressed archives so archive subarrays aren't
  pinned unnecessarily.

DRY / surface:
- prelude/bootstrap-shared.js: single source of truth for COMPRESS_* constants,
  pickDecompressorSync/Async, and a context-aware zstdMissingError (build-host
  vs end-user remediation). Classical bootstrap and SEA VFS both consume it;
  the local zlib require in bootstrap.js is gone.
- lib/compress_type.ts: getZstdCompressSync / getZstdCompressStream replace the
  duplicated 'zlib as unknown as { ... }' casts in producer.ts and sea-assets.ts
  and emit a single build-error string (now also includes process.version).
- lib/help.ts: add Zstd to the --compress description and examples.
- lib/index.ts: the 'invalid compression algorithm' error now lists the real
  accepted tokens (None/none, Brotli/br, GZip/gz/gzip, Zstd/zs/zstd); the
  compression banner goes through log.info instead of console.log.

Tests:
- test/test-93-sea-compress: assert each compressed binary is at least 50 KB
  smaller than the None build so a silent fallback to uncompressed fails the
  test (the prior byte-equality check couldn't detect that regression).
- test/test-80-compression: cover --compress Zstd in the classical pipeline
  (lib/producer.ts and prelude/bootstrap.js zstd branches) when zlib.createZstdCompress
  is available on the build host.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread prelude/sea-vfs-setup.js Outdated
robertsLando and others added 3 commits April 18, 2026 15:45
Dead code:
- Drop SeaAssetsResult.entryIsESM — seaEnhanced destructures only
  { assets, manifestPath } and the value is read via manifest.entryIsESM at
  runtime, so the return-shape field was carrying a stale copy.
- Drop the 'syscall' parameter from SEAProvider._resolveSymlink: all five
  callers pass only the path, and ELOOP is rare enough that hardcoding
  err.syscall = 'stat' is fine.
- Drop the 'context' parameter from pickDecompressorSync/Async and merge
  zstdMissingError into a single runtime-wording string: only 'runtime' was
  ever passed (build-side Zstd errors go through lib/compress_type.ts's own
  zstdBuildError).
- Drop unused COMPRESS_GZIP/BROTLI/ZSTD exports from bootstrap-shared —
  callers now go through pickDecompressor and only COMPRESS_NONE is read
  directly by sea-vfs-setup.
- Remove the redundant process.argv[1] = entrypoint assignment in
  sea-bootstrap.js; sea-bootstrap-core.js already sets it to the same value.
- Inline the single-use ZSTD_MISSING_BUILD_REMEDIATION constant.

Hot paths (~30K lookups per startup on large projects):
- toManifestKey: skip the backslash→slash regex on POSIX hosts where paths
  already match the manifest shape; keep the replace on win32 where it's
  mandatory.
- _resolveSymlink: short-circuit before entering the MAX_SYMLINK_DEPTH loop
  when the path isn't a symlink key (the common case).

Comments:
- sea-assets.ts: rename the Zstd-resolution rationale to point at
  zstdBuildError, which is where the wording now lives.
- bootstrap-shared.js: tighten the COMPRESS_NONE comment now that only it
  is exported.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause: payload.txt starts with 0x0a; on a Windows checkout git's
autocrlf converted it to 0x0d 0x0a, so PAYLOAD.slice(0, 32) contained a
leading \r\n that survived in `expected` but got stripped from `actual`
via the existing replace(/\r\n/g, '\n'), causing the equality assertion
to fail across every Windows job.

Fix:
- Add .gitattributes so payload.txt is checked out LF on every platform;
  the SEA archive bytes are now deterministic cross-platform, which also
  keeps the compressed-size assertion stable.
- Normalize CRLF in `expected` as defense-in-depth so an existing Windows
  clone (cloned before .gitattributes landed) still passes the test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review feedback on PR #251: an attacker who can rewrite the SEA blob
can also rewrite `manifest.stats[p].size` to match the payload they ship,
so the post-decompression `buf.length === expected` check does not survive
a consistent tamper — it only fires on accidental corruption, which is a
narrow and unlikely case.

Keep `maxOutputLength`: it bounds the zlib allocation up front so a blob
with a plausible-but-inflated manifest can't request unbounded memory
before we discover the size mismatch. That bound is cheap and standard
Node zlib hygiene. Also keep the `stats.size` validation: `maxOutputLength`
requires a finite integer, so NaN / negative / missing values must still
be rejected before reaching zlib.

Tightened the comment to reflect the actual threat model (bounded
allocation vs. tamper detection) instead of the earlier bomb-defense
framing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robertsLando robertsLando merged commit fdf8046 into main Apr 18, 2026
23 of 27 checks passed
@robertsLando robertsLando deleted the feat/sea-compression branch April 18, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SEA mode: add per-stripe compression (~75 MB win on typical apps)

1 participant