Skip to content

Env validation + structured error logs#15

Open
akshitkrnagpal wants to merge 2 commits into
mainfrom
env-validation
Open

Env validation + structured error logs#15
akshitkrnagpal wants to merge 2 commits into
mainfrom
env-validation

Conversation

@akshitkrnagpal

@akshitkrnagpal akshitkrnagpal commented Apr 29, 2026

Copy link
Copy Markdown
Owner

Summary

Stops the cryptic `atob() called with invalid base64-encoded data` errors that show up deep in the dispatch consumer when an env var is missing or malformed. New validation surfaces the variable name + a non-secret reason at three entry points + via `/health/deep`.

Why

Today the path is:

  1. `ENCRYPTION_KEY` is undefined on `edgepush-api`
  2. Every queued message tries to decrypt → `atob(undefined)` → `InvalidCharacterError`
  3. Operator sees one log line per job, none of which name the env var
  4. Operator has to read crypto.ts and trace the call chain to figure it out

After this PR:

  1. Queue consumer validates env at batch start
  2. Logs `[dispatch] env validation failed: ENCRYPTION_KEY `
  3. Whole batch retries (drains automatically once the secret is fixed)

Same for HTTP routes (`onError` returns 503 with structured detail) and cron (validation at scheduled-event entry).

Changes

  • `packages/server/src/lib/env-validation.ts` — new module. `EnvValidationError` class with `variable` + `reason` fields. `validateBase64`, `validateRequired`, `validateRequiredEnv`, `checkRequiredEnv` (collects all errors instead of throwing on the first).
  • `packages/server/src/lib/crypto.ts` — `importKey` delegates base64 length check to `validateBase64` so the error always names `ENCRYPTION_KEY`.
  • `packages/server/src/index.ts` — Hono `onError` handler converts `EnvValidationError` → 503 `{ error: "server_misconfigured", variable, reason }`. Adds `env` component to `/health/deep` results so operators can probe configuration explicitly.
  • `packages/server/src/dispatch.ts` — validates env at batch start. Wraps per-app errors with `EnvValidationError` detection.
  • `packages/server/src/cron.ts` — validates env at scheduled-event entry.
  • `packages/server/src/lib/env-validation.test.ts` — 19 unit tests including coverage for the "variable name appears, value never appears" log-safety property.

Operational notes

  • Reasons are bounded to: `"missing or empty"`, `"not valid base64 (length N, contains characters outside [A-Za-z0-9+/=])"`, `"decoded to N bytes, expected M"`. Length is fine to expose; the value is not. Tests assert the value never leaks into the message.
  • The server package didn't have vitest set up, so `*.test.ts` is excluded from the server tsconfig. Tests run via `pnpm exec vitest run` from `packages/server`. Proper vitest config + test target as a follow-up.

Test plan

  • `pnpm typecheck` clean across the monorepo
  • `pnpm lint` clean
  • `pnpm exec vitest run src/lib/env-validation.test.ts` — 19/19 pass
  • After merge, hit `/health/deep` on edgepush-api with `OPERATOR_PROBE_TOKEN` to verify the new `env` component returns `status: ok`
  • Optional: temporarily set a malformed `ENCRYPTION_KEY` on a preview deploy and confirm `/health/deep` returns `status: down, detail: ENCRYPTION_KEY: ...`

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

akshitkrnagpal and others added 2 commits April 29, 2026 22:03
The earlier cryptic "atob() called with invalid base64-encoded data"
deep in the dispatch consumer was an undefined ENCRYPTION_KEY on
edgepush-api. The error itself never named the variable; the operator
had to trace it back through decryptCredential to figure out which
secret was wrong.

This adds runtime env validation that surfaces the variable name + a
non-secret reason at three entry points:

- HTTP routes: a Hono onError handler catches EnvValidationError and
  returns a structured 503 with the variable + reason instead of a
  generic 500.
- Queue consumer (dispatch): validates required env at batch start.
  Misconfiguration retries the whole batch (so jobs drain once the
  operator fixes things) and logs a single clear line per batch
  instead of one cryptic atob error per job.
- Scheduled handler (cron): validates env at scheduled-event entry.
  The probe cycle would otherwise fail mid-loop with a confusing
  log line per app.

Also exposes the validation through /health/deep — the new "env"
component reports any misconfigured required secrets by name.

Validation messages never include the secret value. Reasons are
limited to: "missing or empty", "not valid base64 (length N,
contains characters outside [A-Za-z0-9+/=])", "decoded to N bytes,
expected M". Length is fine to expose; the value is not.

19 unit tests in env-validation.test.ts cover the validators and
the variable-name leakage prevention. Excluded test files from the
server tsconfig so they don't pollute the build types — the package
will get a proper vitest setup as a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the hand-rolled validators with @t3-oss/env-core's createEnv
+ a zod schema. Same surface (parseEnv / checkEnv / EnvValidationError)
but the schema is declarative and the error path goes through zod's
issue list instead of throw chains.

- Add @t3-oss/env-core dep
- Move env.ts; delete env-validation.ts/.test.ts
- ENCRYPTION_KEY validator: zod.string with a superRefine that base64-
  decodes and asserts byte length. Other required vars use the
  standard zod.string().min(1) and zod.url() validators
- Expose validateEncryptionKey for crypto.ts so a bad key still
  surfaces with the variable name (instead of having to call full
  parseEnv with synthetic values for the others)
- 13 tests covering parseEnv (throw-on-first), checkEnv (collect-all),
  validateEncryptionKey, and the "value never leaks into the error"
  property
- Add vitest as a dev dep on the server package so tests run from
  there directly (not pulled from the workspace root)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@akshitkrnagpal

Copy link
Copy Markdown
Owner Author

Code Review Summary

Verdict: Changes requested

Blocking

  • packages/server/src/index.ts:48 — HTTP requests never call parseEnv(c.env). The new app.onError only formats EnvValidationErrors thrown later, but BETTER_AUTH_SECRET / BETTER_AUTH_URL are read by createAuth() without going through the new validator. So /api/auth/** can still fail with library/default behavior instead of the promised structured 503, and a missing auth secret may not be caught at all. Add request-surface validation before protected/API handlers, or validate inside createAuth() with the new schema.

Verified

  • CI check ci is completed/success on 2a64c5c.
  • Local: corepack pnpm --filter @edgepush/server typecheck
  • Local: corepack pnpm --filter @edgepush/server exec vitest run src/lib/env.test.ts (13 passed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant