Skip to content

perf(agents): shrink pre-built base image + make ensure_image_present opt-in/background #223

@jaylfc

Description

@jaylfc

Context

Issue #220 landed pre-built LXC base images (matrix arm64 + x64, published to the `rolling-images` Release tag). Deploy time after the image is imported drops from 60–90s to ≤15s.

But on first run, fetching the ~300–500 MB image plus the ~10s import is roughly a wash with the current per-deploy build (~60–90s) on a typical home WiFi connection. For users who only ever deploy one agent, the image path can actually be slower than the build path.

Two improvements to make this a clear win in all cases

A. Shrink the published image

Current image likely includes:

  • Full apt cache (`/var/cache/apt/archives`) — 20–50 MB recoverable
  • Locale data (`/usr/share/locale/*` minus C/POSIX) — 30–80 MB recoverable
  • npm install with devDependencies — 50–150 MB recoverable via `npm prune --production`
  • Doc directories (`/usr/share/doc`, `/usr/share/man`) — 20–40 MB recoverable
  • Possibly redundant Node.js binaries

In the build workflow, before `incus publish`:

```bash
incus exec build-base -- bash -c '
apt-get clean
rm -rf /var/cache/apt/archives /var/lib/apt/lists/*
rm -rf /usr/share/doc /usr/share/man /usr/share/locale/!(C|POSIX|en_US.utf8)
cd /opt/openclaw && npm prune --omit=dev --omit=optional || true
'
```

Target: ≤200 MB compressed per arch (down from ~300–500). At ≤200 MB, even a 50 Mbps home connection downloads in ~30s, and the import step is the same regardless of size.

B. Make `ensure_image_present` non-blocking on first boot

Current behaviour: `tinyagentos/app.py` startup hook calls `ensure_image_present` which downloads the image synchronously. If the user is doing a fresh install and just wants to see the UI, they're held up for a 300+ MB download before `/api` even responds healthy.

Options (pick one):

  1. Background — kick off the download as a background asyncio task; return startup-healthy immediately. Deployer's existing fallback to per-deploy build already covers the window before the image lands. First deploy ever may be slow (build), subsequent deploys are fast.
  2. Opt-in — only run `ensure_image_present` when the user explicitly clicks "Download fast-deploy image" in the providers / agents settings UI. Until they opt in, every deploy uses the build path.
  3. Lazy on first deploy — defer the download to the first `POST /api/agents/deploy` call. That deploy uses the build path while the image downloads in background; second deploy onwards uses the image.

Option 3 is probably the best UX: the user pays the cost while they're already waiting for something, not during "taOS startup".

Acceptance

  • Image build pipeline (`build-agent-images.yml`) trims apt cache, locale data, doc dirs, dev npm modules before `incus publish`. Compressed image ≤200 MB per arch.
  • `ensure_image_present` no longer blocks app startup. Pick one of the three options above and document the choice in code.
  • First taOS startup on a fresh install responds healthy in ≤10s regardless of whether the image is downloaded.
  • Existing deployer fallback (per-deploy build when no image present) continues to work — verified with a unit test asserting the launch image alias is the fallback when `is_image_present` returns False.
  • Documentation note in the relevant README or docs explaining the trade-off (one-time download vs per-deploy build) and how to opt in/out if applicable.

Out of scope

  • Switching the base distro (Alpine, Wolfi) — separate experiment if size is still a concern after the trims above.
  • Image-streaming / simplestreams server — only worth it if we hit GitHub's bandwidth costs, which we won't on the Free tier.
  • Peer-share via the cluster (a host that already has the image serves it to other workers) — depends on the broader cluster work.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentsAgent frameworks and deploymentenhancementNew feature or requestinfrastructureBuild system, CI, deploymentkilo-duplicateAuto-generated label by Kilokilo-triagedAuto-generated label by Kilo

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions