Skip to content

Worktree feat+xdg paths#22

Merged
AvivK5498 merged 15 commits into
mainfrom
worktree-feat+xdg-paths
May 15, 2026
Merged

Worktree feat+xdg paths#22
AvivK5498 merged 15 commits into
mainfrom
worktree-feat+xdg-paths

Conversation

@AvivK5498

Copy link
Copy Markdown
Owner

No description provided.

AvivK5498 added 15 commits May 15, 2026 17:01
getDataDir() now resolves in order: GOLEM_DATA_DIR env var, ./data/ in cwd
(dev workflow preserved), or OS-native default (~/Library/Application Support/golem
on macOS, $XDG_DATA_HOME/golem on Linux, %APPDATA%/golem on Windows). Data dir
is chmod 700 on first creation.

Prerequisite for shipping as 'npm i -g golem-agent' — global daemon can't read
data/ from cwd. All persistent state already routes through dataPath()/getDataDir()
so no per-store-ctor changes were needed.

Adds describeDataDirResolution() so the platform startup log (and forthcoming
golem doctor + first-run banner) can show which branch resolved. Refs: bd h6t.
…doctor

Reworks the previously monolithic src/cli.ts into a dispatcher that routes
to per-subcommand modules under src/cli/. Running `golem` with no args still
starts the platform (backwards-compatible default).

- start: refuses to launch a second instance via pid file at $DATA/golem.pid;
  pid + start time are written for status/stop to read.
- stop: SIGTERM with 10s grace, escalates to SIGKILL.
- status: reports resolved data dir and (pid, uptime) or 'not running'
  with LSB-conventional exit code 3.
- logs: detects systemd user unit / launchd plist, falls back to tailing
  $DATA/logs/*.log most-recent.
- version: reads package.json by walking up from the entry path so it works
  in both dev and installed layouts.
- doctor: scaffolded with Node-version / data-dir-writable / daemon-running
  checks; extended by bd 49x to cover OpenRouter + Telegram + disk.
- update: stubbed pending publishing pipeline (bd vx9).

bin/golem.js now forwards argv, prefers compiled dist/cli.js when present
(npm-installed layout), and only re-spawns on exit code 75 for start.
Drops the chdir(root) — paths.ts handles install-vs-dev resolution. Refs: bd dxn.
Replaces the inline 'no OPENROUTER_API_KEY' nag in start.ts with a proper
startup banner that adapts to the user's situation:

- Always shows resolved data dir + source (env / cwd / OS default)
- Onboarding mode: prints localhost:3015, plus a copy-paste-ready
  'ssh -L 3015:localhost:3015 user@host' when $SSH_CONNECTION is set
- Already-configured + SSH: still shows the tunnel command so the user
  knows how to re-reach the UI
- Already-configured + local: one-liner with the UI URL

$SSH_CONNECTION parsing handles IPv4, IPv6 (bracketed for shell safety),
and missing USER (falls back to LOGNAME). Drops the redundant data-dir
console.log added in the previous commit — the banner covers it.

This is the highest-leverage UX writing in the project — see
docs/brain/pages/decision/vps-deployment-and-cli.md. Refs: bd i30.
…pace

Replaces the doctor scaffold with the full check matrix:
- Node version (>= 20)
- Data dir writable + resolution source
- Disk space (warn <1 GiB, fail <100 MiB) via fs.statfs
- OpenRouter key valid via /api/v1/key (5s timeout)
- Each agent's Telegram bot token via getMe — opens agents.db readonly
  so a running daemon is undisturbed; resolves ${ENV_VAR} refs in the
  bot token field. Reports @username when valid.
- Daemon running (from PID file)
- Logs dir present

Extracts the validators to src/utils/{telegram,openrouter}-validate.ts so
the onboarding wizard (bd v0n.23) can call the same getMe code path —
'valid in the wizard' and 'valid in doctor' use the exact same definition.

12 new tests for the validators (with injected fetch — no live network).
ffmpeg deliberately not checked: Whisper API accepts Telegram OGG directly.
Refs: bd 49x.
Adds POST /api/telegram/verify-token (backend) consuming the shared
validateTelegramToken util — same getMe round-trip 'golem doctor' uses.

Wizard StepTelegram now calls it on Next: shows 'Verifying...', then
either 'Connected as @username' before advancing or the error inline.
Replaces the prior 'token contains a colon and is >30 chars' heuristic
that let typos pass and produced a silently-dead bot.

StepDone gains a failed state: after the 60s health-poll budget runs
out, renders an AlertCircle, points to `golem logs` / journalctl, and
offers a Try again button. Previously the UI stayed on the shimmer
forever with no recourse. Audit findings C1.5/C1.6. Refs: bd v0n.23.
Adds platform-specific daemon installation under src/cli/install-daemon/:

- linux.ts: writes ~/.config/systemd/user/golem.service, runs
  daemon-reload + enable --now. Detects 'no linger' on VPSes and prints
  the (sudo) loginctl enable-linger reminder rather than running it.
- macos.ts: writes ~/Library/LaunchAgents/com.golem.agent.plist,
  bootstraps + kickstarts via launchctl. Detects an existing repo-local
  plist (the dev install) and refuses without --force, printing the
  exact migration steps (bootout, mv data, --force) rather than
  silently destroying state.
- unit-renderers.ts: pure functions for systemd unit + launchd plist
  text; inferPlistOrigin() classifies an existing plist as matches /
  repo-local / unknown.

ExecStart resolution: only accepts argv[1] if it ends in /bin/golem(.js),
so running install-daemon in dev (via tsx) still writes a unit pointing
at bin/golem.js — not src/cli.ts.

Both subcommands support --dry-run (print without writing) and --force
(overwrite + restart). 12 new tests for renderers + origin detection.
Refs: bd 3mm, bd 8if.
Bumps the version to 0.2.0 (first XDG-aware release / new CLI surface)
and wires the package up to be publishable with `npm publish`:

- build: rm -rf dist && tsc — emits to dist/ (tsconfig already targets it)
- prepublishOnly: typecheck + test + build, so we can't ship something
  broken
- files: now includes dist/, bin/, skills/, ui/ (minus .next + node_modules),
  and src/ as a fallback for tsx-based dev installs; excludes *.test.ts
  and the test harness

bin/golem.js already prefers dist/cli.js when present, so an installed
copy runs the compiled JS directly via plain node — no tsx dependency at
runtime.

Not auto-publishing: that's an irreversible outward-facing action and
needs your explicit approval. To ship: `npm publish` (or
`npm publish --tag beta` for a pre-release channel). Refs: bd vx9.
Rewrites README around the install-first story for hesitating VPS
users. Two-command install up front, dev clone path demoted to a later
'Development' section. New 'Managing your install' section documents
the CLI surface (status/logs/doctor/stop/start/update). New FAQ covers
the questions a self-hoster actually asks: SSH tunnels vs public UI
with auth, ffmpeg, Docker, backup/restore, uninstall, model selection.

Adds tsconfig.build.json that extends the root config and excludes
*.test.ts + test-harness from the emit. Build script now uses it so
dist/ no longer contains compiled tests (which were getting picked up
by `bun test` and double-running with mismatched module identity).
Typecheck still covers tests because it uses the unconstrained root
config. Refs: bd 88h.
Two regressions surfaced by a real VPS install:

- doctor crashed with 'ReferenceError: require is not defined' on the
  dynamic require('better-sqlite3'). ESM has no global require —
  use createRequire(import.meta.url) bound to this module.
- OpenRouter returns limit_remaining=null for keys with no cap; the
  check called .toFixed on it and crashed. Guard with typeof === 'number'.

Caught during the 2026-05-15 Vultr smoke test.
Without this, an installed `npm i -g golem-agent` only ran the platform
API on :3847 — nothing listened on :3015 and the SSH tunnel got
'connection reset' when the user tried to open the wizard. The earlier
README claim that `golem install-daemon` was enough was wrong.

Fix:
- ui/next.config.ts: output: 'standalone' plus outputFileTracingRoot /
  turbopack.root pointed at the workspace, so Next emits a self-contained
  ui/.next/standalone/ui/server.js bundle (with its own node_modules).
- src/cli/ui-server.ts: spawn that server.js as a child of `golem start`,
  binding HOSTNAME=127.0.0.1 (loopback only) + PORT=3015. Handle clean
  SIGTERM on shutdown; warn cleanly if the bundle isn't built (dev clone).
- src/cli/start.ts: wire startUi() into the lifecycle.
- package.json: build script now also runs the UI build and copies
  ui/public + ui/.next/static into the standalone tree (Next standalone's
  required layout). files: includes ui/.next/standalone/.
- src/cli.ts: load .env from BOTH cwd and the data dir (override:false
  so cwd wins on conflict). Onboarding wizard writes .env to the
  daemon's cwd which is the data dir; `golem doctor` from an
  interactive shell has cwd=$HOME and was finding nothing. With this,
  doctor and the daemon see the same env.

Verified end-to-end on a fresh Vultr Ubuntu 24.04: npm i -g, install-daemon,
SSH tunnel, walked the wizard, agent responded on Telegram, doctor all green.
…click

README goes from a 180-line everything-doc to a 79-line landing page:
quickstart (VPS + local), Deploy on Vultr badge, feature list, philosophy.
Everything else moves to docs/ where it's deeper and discoverable.

- docs/INSTALL.md: three install paths (Vultr one-click, manual VPS,
  local dev) with detailed steps, including the 'use any free local
  port' SSH tunnel guidance and backup/restore/update/uninstall.
- docs/CLI.md: full subcommand reference with flags, exit codes,
  environment vars.
- docs/FAQ.md: the FAQ from the old README, expanded.

scripts/vultr-startup.sh: first-boot bootstrap for the Deploy on Vultr
button. Idempotent — installs Node 24 if missing, npm-installs golem,
runs install-daemon, enables linger, writes a helpful MOTD telling
the SSH'ing user what to do next.

The deploy URL in the badge currently has __VULTR_SCRIPT_ID__ and
__VULTR_AFFILIATE_REF__ placeholders — owner needs to register the
script in Vultr's dashboard, get the ID, and substitute both in the
README before publishing.
Aviv's call — nobody reads them.
The previous commit message described this change but the file wasn't
staged. Without it, the doctor-from-shell flow doesn't see the wizard-
written .env at $DATA/.env (it's only read by the daemon, which has
cwd=$DATA). Adds dotenv.config() for cwd .env, then a second call with
override:false for the data-dir .env so cwd wins on conflict (dev).
The release workflow ran 'npm publish' which triggers prepublishOnly,
which runs 'bun test'. Bun wasn't installed in the runner — exit 127.
Adds oven-sh/setup-bun@v2 before the install step.
@AvivK5498 AvivK5498 merged commit 64aab28 into main May 15, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant