Skip to content

fix: unbreak CI — drop make -t-incompatible dist co-target + repair getServiceInterface call site#3393

Merged
dr-bonez merged 2 commits into
masterfrom
fix/dist-content-stale-heal
Jul 3, 2026
Merged

fix: unbreak CI — drop make -t-incompatible dist co-target + repair getServiceInterface call site#3393
dr-bonez merged 2 commits into
masterfrom
fix/dist-content-stale-heal

Conversation

@helix-nine

Copy link
Copy Markdown
Contributor

Follow-up to #3391. @dr-bonez still hit Cannot find module 'zod-deep-partial' after a (non-clean) build with #3391 merged.

Why #3391 wasn't enough

#3391's grouped co-target catches a missing bundled node_modules, and the rm -rf dist recipes keep a triggered rebuild clean. But neither heals a dist that is present-but-stale: a dist/ carried across a dependency change (the reorg, a branch switch, an interrupted build) keeps a fresh dist/package.json mtime while its bundled node_modules predates a newly-added dep like zod-deep-partial. mtime gating fundamentally can't see stale content behind a fresh marker, so make declares the dist up-to-date, skips the rebuild, and the container-runtime ships without the dep — the same boot error, even after building.

I reproduced this exactly: seed shared-libs/ts-modules/start-core/dist + projects/start-sdk/dist to that state (zod removed, fresh markers), run the container-runtime image build, and the image boots to Cannot find module 'zod-deep-partial'.

Fix — a content gate

Each dist now copies the lock it was built from into dist/package-lock.json, and the build.mk gate adds a FORCE prerequisite when that copy no longer matches the current source lock (portable cmp -s):

CORE_DIST_STALE := $(shell cmp -s .../start-core/package-lock.json .../start-core/dist/package-lock.json || echo FORCE)
 dist/package.json  &: $(CORE_DIST_STALE) $(call ls-files, …)
  • A dist whose bundled node_modules was built from a different lock → mismatch → rebuild.
  • A dist built before this change (no dist/package-lock.json) → cmp fails → rebuild. This is the case that heals existing trees like @dr-bonez's — no manual clean needed.
  • An in-sync dist → cmp matches → no FORCE, normal mtime gating, no wasted rebuild.

Applied to both start-core and start-sdk. The start-core heal cascades into the SDK's vendored copy through the existing start-core/dist/package.json dep edge. dist/package-lock.json is auto-excluded from npm publish (verified via npm pack --dry-run), so the published SDK is unaffected.

Verification

  • Stale-state repro (both dists, zod removed, no lock copy, fresh markers) → make now force-rebuilds (make bundle + start-core + install-dist-deps all fire) → assembled image has zod (3 copies) and boots past module loading to the expected container-only /media/startos/rpc.
  • In-sync tree → second make is a clean no-op (no spurious rebuilds).
  • Full make DAG parses cleanly for startbox/cli/registry/tunnel/web.

@dr-bonez — with this you shouldn't need to clean; the next build will detect the stale dist and rebuild it. (If you want to confirm the mechanism first: cmp -s shared-libs/ts-modules/start-core/package-lock.json shared-libs/ts-modules/start-core/dist/package-lock.json; echo $? — non-zero means it'll force-rebuild.)

@helix-nine helix-nine force-pushed the fix/dist-content-stale-heal branch from 0caeca4 to 51129c7 Compare July 3, 2026 16:35
@helix-nine

Copy link
Copy Markdown
Contributor Author

Rebased onto current master and added a commit fixing the CI failure (it was not from the dist changes):

Root cause: #3392 (refactor: remove serviceInterface get/getOwn helpers) removed util.getServiceInterface from start-core but left the legacy SystemForEmbassy/index.ts still calling utils.getServiceInterface(effects, …).once(), so the container-runtime tsc build breaks on master (TS2339: Property 'getServiceInterface' does not exist …). My branch predated #3392, which is why local passed until the merge surfaced it.

Fix: reconstruct the same filled-interface lookup inline from the retained effects.getServiceInterface + effects.getHostInfo + utils.filledAddress — exactly what the removed makeInterfaceFilled did. Verified locally: tsc --noEmit clean, prettier clean, jest 12/12 + 6 snapshots pass.

Since this is a master-wide breakage (every PR's CI is red against current master), happy to split it into its own fast PR if you'd rather land the master fix ahead of the dist discussion — just say the word.

#3391's grouped `dist/package.json dist/node_modules/.package-lock.json &:`
co-target (to force a rebuild when the bundled node_modules is missing) is
incompatible with the ISO build's "Prevent rebuild of compiled artifacts"
step, which runs `make -t compiled-<arch>.tar` to mark the downloaded
artifacts current. `make -t` reaching the group via `dist/package.json`
touches only that target, NOT the grouped sibling stamp — so the stamp stays
missing and `make startos-iso` re-runs `make -C start-core dist`, which
cascades into a web-UI rebuild that reads an empty config.json and fails
(`Unexpected end of file in JSON`). Restore the plain single-target rule,
which `make -t` suppresses cleanly. Keeps the projects/start-sdk ls-files
dead-edge fix.
#3392 removed util.getServiceInterface from start-core but left this legacy
call site using it, breaking the container-runtime tsc build on master.
Reconstruct the same filled-interface lookup inline from the retained
effects.getServiceInterface + effects.getHostInfo + utils.filledAddress.
@helix-nine helix-nine force-pushed the fix/dist-content-stale-heal branch from 51129c7 to a185be2 Compare July 3, 2026 17:47
@helix-nine helix-nine changed the title fix: force dist rebuild when the bundled node_modules is stale content fix: unbreak CI — drop make -t-incompatible dist co-target + repair getServiceInterface call site Jul 3, 2026
@helix-nine

Copy link
Copy Markdown
Contributor Author

Reworked this — you were right that something was actually wrong, and it was my own change.

What was breaking Build Image: the ISO job downloads the compiled artifacts and runs make -t compiled-<arch>.tar ("Prevent rebuild of compiled artifacts") to mark them current, then make startos-iso. My dist gates fight that:

Force/co-target rebuild triggers are fundamentally incompatible with a make -t-based "trust the artifacts" flow. So I dropped both: the dist gates are back to the plain single-target rule, which make -t suppresses cleanly (verified locally — make -t <dist> && make <dist> is now a no-op instead of rebuilding).

This PR now: (1) reverts that co-target, and (2) fixes the getServiceInterface call site (#3392 breakage). Kept: the container-runtime rm -rf dist→re-vendor ordering fix (the actual root cause of the original zod-deep-partial error) and the projects/start-sdk ls-files dead-edge fix — both make -t-safe.

Net for the original stale-dist problem: the container-runtime ordering fix covers the real mechanism; a dist that's already stale-with-fresh-mtime just needs a one-time rm -rf …/start-core/dist projects/start-sdk/dist. Trying to auto-heal that in-make is what kept breaking things.

@dr-bonez dr-bonez merged commit 6943935 into master Jul 3, 2026
20 checks passed
@dr-bonez dr-bonez deleted the fix/dist-content-stale-heal branch July 3, 2026 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants