Skip to content

perf: speed up the dependency walker#245

Open
robertsLando wants to merge 1 commit intomainfrom
perf/walker-optimization
Open

perf: speed up the dependency walker#245
robertsLando wants to merge 1 commit intomainfrom
perf/walker-optimization

Conversation

@robertsLando
Copy link
Copy Markdown
Member

Summary

  • Replace @babel/parser + @babel/generator with acorn in the dependency detector (~3-5x faster parsing)
  • Add a module resolution cache to skip redundant follow() calls for the same specifier from the same directory
  • Batch fs.stat() calls with Promise.all() in appendFilesFromConfig()
  • Add timing instrumentation to the walker (visible with --debug)

The acorn parser tries module mode first, falling back to script mode for legacy packages using strict-mode-incompatible syntax (with statements, octal escapes). The resolution cache deep-clones markers on cache hit to prevent shared mutation.

@babel/* deps are retained — lib/esm-transformer.ts still uses them.

Benchmarks (zwave-js-ui)

Method Before After Improvement
SEA no-bundle (walker-heavy) 38.5s 27.2s -29.5%
SEA + bundle 10.1s 9.0s -11.6%
Standard PKG + bundle 16.7s 14.8s -11.6%

Binary sizes unchanged. All 94 test-50-* tests pass.

Test plan

  • yarn build passes
  • yarn lint clean
  • 94/94 test-50-* integration tests pass
  • SEA worker-threads test passes
  • AST parsing edge case test passes
  • Benchmarked on zwave-js-ui (3 runs averaged per method)

Closes #239

🤖 Generated with Claude Code

Replace Babel with acorn for AST parsing in the detector (~3-5x faster
parsing), add a module resolution cache to skip redundant follow() calls,
and batch independent fs.stat() calls with Promise.all().

The acorn parser tries module mode first, falling back to script mode for
legacy packages using strict-mode-incompatible syntax (with statements,
octal escapes). The resolution cache deep-clones markers on cache hit to
prevent shared mutation from stepActivate.

Benchmarked on zwave-js-ui:
- SEA no-bundle (walker-heavy): -29.5%  (38.5s → 27.2s)
- SEA + bundle:                 -11.6%  (10.1s → 9.0s)
- Standard PKG + bundle:        -11.6%  (16.7s → 14.8s)

Binary sizes unchanged. All 94 test-50-* tests pass.

Closes #239

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@robertsLando
Copy link
Copy Markdown
Member Author

Research Notes: Approaches Investigated & Future Optimization Opportunities

Documenting findings from the research phase to avoid re-investigating in the future.


Worker Threads (investigated, NOT worth it)

Prototyped a batch-by-level worker pool approach using tinypool:

  • Split step_STORE_ANY into prepare/detect/postDetect phases
  • Batched AST detection across N worker threads in parallel
  • BFS traversal instead of DFS (final record set is identical)

Results on zwave-js-ui:

Method Acorn+Cache (this PR) + Worker Threads
SEA no-bundle 27.2s 29.5s (+8.6% slower)
SEA + bundle 9.0s 10.5s (+16.9% slower)

Why it's slower: With acorn, per-file parse times are only ~1-2ms. The structured clone overhead per file (~0.1-0.5ms) + worker startup cost (~50-100ms total) outweighs the parallelism gain. Workers would have been worthwhile with Babel (~5-15ms per file) but not with acorn. Don't retry unless the detection step becomes significantly heavier.

Also evaluated: piscina (heavier, same conclusion), workerpool (higher serialization overhead), manual worker_threads pool. All suffer from the same fundamental issue: per-file work is too small to amortize inter-thread communication.


Bundler Replacement (esbuild/Rolldown — investigated, NOT viable as walker replacement)

The walker does far more than "find imports." Bundlers can't handle:

  • Dictionary patches (100+ package-specific configs in dictionary/)
  • Native .node addon detection
  • __dirname/path.join(__dirname, ...) pattern detection
  • Glob expansion in require paths
  • Public/private file classification
  • try/catch require context tracking (mayExclude flag)

esbuild's metafile only captures bundled deps (misses dynamic/optional requires). A hybrid approach (esbuild for initial graph hint, walker for gaps) was estimated at ~10-20% gain for high implementation complexity. Not worth it given the ~30% already achieved.


Acorn Edge Cases (addressed in this PR)

  1. sourceType: 'module' rejects legacy syntaxwith statements and octal escapes are forbidden in strict mode. Fix applied: fallback to sourceType: 'script' on parse failure.
  2. ES2025 quoted import names (import { "foo-bar" as baz }) — acorn rejects these. Very rare; caught by existing try/catch.
  3. reconstruct() output differs — source slicing preserves whitespace/comments vs Babel's normalized output. Cosmetic only (used for warning messages).

Resolution Cache Edge Cases (addressed in this PR)

The cache stores Marker objects containing mutable config. stepActivate() modifies config via Object.assign(). Fix applied: deep-clone marker objects on cache hit to prevent shared mutation.


Future Optimization Opportunities (not in this PR)

Ranked by estimated impact on zwave-js-ui:

# Optimization Est. Impact Difficulty Notes
1 Parallelize bytecode compilation in producer 50-70% of bytecode phase Medium fabricateTwice() spawns sequential Node.js subprocesses. Each is stateless — trivially parallelizable with a concurrency limiter. This is likely the next biggest win after the walker.
2 Parallel Brotli/GZip compression in producer 20-40% of compression phase High Currently inline in the Multistream pipeline. Would need pre-computing compressed buffers in parallel.
3 Async module resolution (follow.ts) 10-20% Medium follow() has // TODO async version. Currently wraps resolve.sync(). Batch derivatives could resolve in parallel.
4 Eliminate file double-reads in traditional mode 10-30% on slow I/O Low Walker reads files, then producer re-reads from disk via fs.createReadStream(). SEA mode already avoids this by trusting record.body.
5 Pre-load dictionary as single bundled module 50-150ms Low readDictionary() does 100+ sequential require() calls. Could be a single pre-built JSON import.

@robertsLando
Copy link
Copy Markdown
Member Author

Additional Investigation: Bytecode Parallelization & Async Resolution

Parallel Bytecode Compilation (investigated, no improvement)

Implemented a child process pool in fabricator.ts + pre-fabrication phase in producer.ts that compiles all STORE_BLOB stripes in parallel before the Multistream pipeline.

Results on zwave-js-ui (Standard PKG + bundle):

Version Build Time
Baseline (main) 16.7s
Acorn+Cache (this PR) 14.8s
+ Parallel Bytecode 15.0s

Why no improvement: With esbuild pre-bundling, the build produces a single large JS file — there are very few STORE_BLOB stripes to parallelize. The bottleneck is compiling one big file, not many small ones. Parallelization would help on non-bundled builds with hundreds of JS files, but those typically fail on ESM projects anyway.

Async Module Resolution (investigated, 5 test failures)

Converted follow.ts from resolve.sync() to the async resolve() API with fs.promises-based hooks.

Result: 5 test failures (test-50-cannot-include-df, test-50-ignore-files, test-50-package-json-4/9/9p). The resolve package's async and sync APIs have subtle behavioral differences in callback ordering and error handling. The readFile/packageFilter hook interaction differs between sync and async modes, causing incorrect package.json processing for edge cases.

Estimated gain was only 10-20% on the derivatives phase (which is already heavily cached). Not worth the regression risk.

Conclusion

The three optimizations in this PR (acorn parser, resolution cache, batched I/O) represent the sweet spot — maximum gain with zero functional regressions. Further optimization would need to target the non-walker phases (compression, SEA blob assembly) or wait for a major architectural change (e.g., replacing the Multistream pattern with a pre-computed buffer approach).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: speed up the dependency walker

1 participant