Skip to content

perf(install): use btrfs/ZFS storage pool for incus to unlock CoW container clones (deploy ≤5s) #224

@jaylfc

Description

@jaylfc

Context

#220 landed pre-built openclaw base images. Deploy time dropped from 90s → 33s — meaningful but short of the ≤15s target. The bottleneck is now isolated: `incus launch taos-openclaw-base probe-raw` alone takes ~19s because incus's default storage pool uses the `dir` driver on ext4, so every container launch is a full 510 MB filesystem copy.

On btrfs or ZFS pools, incus uses copy-on-write (CoW) clones — `incus launch` becomes a sub-second metadata operation, and the per-deploy time drops to roughly 5–10s total (just init + network-ready + install.sh config writes).

Symptoms (measured on Pi 5 Plus during #220 verification)

```
incus launch taos-openclaw-base probe-raw → 19s
idmap restart cycle (containers/lxc.py) → 3s
network-ready loop → 3s
install.sh config writes + systemd reload → 4s
client polling slop → 4s
─────────────────────────────────────────────
TOTAL → 33s
```

Fix

Update the taOS install/setup flow to create a btrfs (or ZFS) storage pool for incus instead of accepting the `dir` default.

Approach A — btrfs (preferred for Pi)

Cleanest on Debian/Armbian where btrfs is in-tree. Either:

  1. Loopback file: `incus storage create default btrfs source=/var/lib/incus/storage-pools/default.img size=50GB`. Simplest, no host-disk reformatting.
  2. Existing btrfs partition: if the host already has btrfs mounted somewhere (e.g., `/var`), point incus at a subvolume. Cheaper IO, no double-journaling.

The host firewall + DOCKER-USER work already in this repo plus the `tinyagentos-host-firewall` units suggest we're comfortable doing systemd unit + setup-script work — same pattern fits here.

Approach B — ZFS

Smaller adoption on ARM/Debian (license + module-build friction). Use only if btrfs is unavailable on the platform.

Migration story

For existing installs (the Pi currently in use has the `dir` pool already populated with mary/naira/stanley):

  1. Detect current pool driver in setup
  2. If `dir` and no agents present → just recreate as btrfs
  3. If `dir` with existing containers → offer a migration path (`incus storage create new-default btrfs ... && incus move --target-storage new-default`) but treat as opt-in. Don't auto-migrate live containers.
  4. New installs: btrfs from the start.

mary/naira/stanley on the current Pi must NOT be touched without explicit opt-in.

Acceptance

  • Fresh taOS install creates a btrfs (or ZFS) incus pool by default.
  • Documented in setup docs: how the pool is created, where it lives, how to migrate from `dir`.
  • On a fresh install with the base image cached, `incus launch taos-openclaw-base` completes in ≤2s.
  • End-to-end `POST /api/agents/deploy` → status=success in ≤15s (the original feat: pre-built openclaw LXC base image (GitHub Actions, per-arch) — drops deploy time 90s → 10s #220 target).
  • Migration script for existing dir-pool installs (opt-in, with a confirmation prompt).
  • Untouched-container guarantee preserved (mary/naira/stanley).

Out of scope

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentsAgent frameworks and deploymentenhancementNew feature or requestfeatureNew featureinfrastructureBuild system, CI, deploymentkilo-auto-fixAuto-generated label by Kilokilo-triagedAuto-generated label by Kilo

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions