Skip to content

SEA mode: add per-stripe compression (~75 MB win on typical apps) #250

@robertsLando

Description

@robertsLando

Summary

SEA mode stores its archive uncompressed, which is the single biggest reason SEA binaries are 30-50% larger than equivalent Standard-mode builds (and larger than Bun / Deno competitors). Porting Standard mode's per-stripe compression primitive to SEA would close most of that gap with no runtime penalty, because zstd can decompress lazily at first file access.

Filing this to track the work and document the measured headroom.

Motivation — where the bytes go

Measured on @anthropic-ai/[email protected] (ESM CLI with yoga.wasm + vendored ripgrep, built with pkg . --sea -t node22-linux-x64):

stock Node 22.22.2 binary:   71 MB  (37% of total)
SEA archive appendage:      123 MB  (63% of total)  ← all the headroom is here
total pkg --sea binary:     194 MB

The archive is stored uncompressed per ARCHITECTURE.md ("Single archive blob is stored uncompressed. Executable size will be larger for the same project").

Compression headroom on the 123 MB archive

Algorithm Output Ratio of original Compress time
gzip -9 50 MB 39% 23 s
zstd -3 48 MB 38% 0.7 s
zstd -19 37 MB 29% 65 s

With per-stripe zstd-3, the claude-code binary would drop 194 MB → ~120 MB, roughly matching Bun's 108 MB for equivalent work. See the "pkg vs Bun vs Deno" docs PR for cross-tool numbers.

Proposed approach

Port Standard mode's existing per-stripe compression primitive (lib/producer.ts — Brotli/GZip) to the SEA archive generator (lib/sea-assets.ts).

  • Compress each file entry in the SEA archive with zstd level 3 (near-lossless compression ratio, ~700 ms build overhead on a 120 MB archive, streaming decompression at 5-10 GB/s).
  • Decompression happens lazily at first access — the bootstrap reads sea.getRawAsset('__pkg_archive__') once (zero-copy ArrayBuffer), then each file's bytes are decompressed on first fs.readFileSync / require and cached in the existing Map. For a typical CLI that touches 5-10 files at startup, decomp cost is dominated by zstd overhead per call (µs-scale), not full-archive decode.
  • Opt-in via --compress zstd first (mirror the Standard-mode flag), then flip to default once proven.

Zstd chosen over gzip (faster) and brotli (similar ratio, slower decomp, no node:zlib builtin in older Node). Node 22+ ships node:zlib with createZstdCompress / createZstdDecompress, so no extra dependency is needed — this is a pure plumbing change.

Rejected alternatives

  • Self-extracting binary (caxa-style) — first-run decompresses 120 MB to disk before anything runs; breaks on read-only filesystems; antivirus flags it more aggressively; doubles on-disk footprint; GC/invalidation is its own footgun. Kills SEA's one-file value prop.
  • Whole-archive compress + decompress at startup — adds ~300 ms of dead time at launch even for files you never read. Per-stripe is strictly better.
  • UPX-wrap the final binary — decompresses the whole executable into memory on launch, pushing cold start backwards. False economy.

Follow-ups (separate issues if anyone wants to pick them up)

  1. strip(1) the base Node binary in pkg-fetch — free 10-15 MB.
  2. Zstd dictionary trained on a JS corpus — extra 10-20% ratio on top of zstd-3.
  3. Walker tree-shaking / terser pass on bundled JS — workload-dependent, often 20-40% off user code size.
  4. V8 startup snapshot of the entrypoint via Node 22's setStartupSnapshotCallback — the only lever that could put pkg ahead of Bun on cold start.

Acceptance criteria

  • pkg . --sea --compress zstd produces a functionally identical binary at ~30-40% of the uncompressed size
  • Cold startup time regression is < 50 ms on a typical CLI (claude-code 1.0.100 as reference)
  • Compressed and uncompressed archives coexist (flag-gated, so we can A/B size vs launch time)
  • DEBUG_PKG=1 output shows compressed + uncompressed sizes per stripe
  • Docs updated in docs-site/guide/compression.md and docs-site/guide/sea-mode.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions