Skip to content

gbrain embed ignores openai_api_key from config.json — only reads OPENAI_API_KEY env var #752

@Jmeg8r

Description

@Jmeg8r

Summary

The embedding service in `src/core/embedding.ts` constructs an OpenAI client via `new OpenAI()` with no arguments, which only reads the `OPENAI_API_KEY` process env var. The `openai_api_key` field that `loadConfig()` reads from `~/.gbrain/config.json` is never consulted by the embed code path.

This is a silent failure: `gbrain embed` exits 0 and prints "Embedded N chunks across N pages" even when every page errored.

Reproduce (gbrain 0.18.2)

```bash

Set up with config-file key only — no env var

cat > ~/.gbrain/config.json <<JSON
{
"engine": "pglite",
"database_path": "/Users/me/.gbrain/brain.pglite",
"openai_api_key": "sk-..."
}
JSON
chmod 600 ~/.gbrain/config.json
unset OPENAI_API_KEY

Add some content

echo -e "---\ntype: note\n---\n\nhello world" | gbrain put test/page

Try to embed

gbrain embed --all
```

Observed:
```
Error embedding test/page: The OPENAI_API_KEY environment variable is missing or empty; either provide it, or instantiate the OpenAI client with an apiKey option, like new OpenAI({ apiKey: 'My API Key' }).
[embed.pages] 1/1 (100%)
Embedded 0 chunks across 1 pages
[embed.pages] 1/1 (100%) done
```

Exit code 0. `gbrain stats` shows `Embedded: 0`.

Expected: embed reads `openai_api_key` from `~/.gbrain/config.json` (or surfaces a hard error before starting if neither source is configured).

Root cause

`src/core/embedding.ts`:
```typescript
let client: OpenAI | null = null;

function getClient(): OpenAI {
if (!client) {
client = new OpenAI(); // ← reads only process.env.OPENAI_API_KEY
}
return client;
}
```

Meanwhile `src/core/config.ts:62-67` does merge the env var INTO the config object, but the inverse path (config-file value → `process.env` or → OpenAI client `apiKey` arg) doesn't exist.

Workaround

Export the key from config at invocation time:
```bash
OPENAI_API_KEY=$(python3 -c "import json; print(json.load(open('/Users/me/.gbrain/config.json'))['openai_api_key'])") gbrain embed --all
```

Suggested fix

In `getClient()`, pass `apiKey` explicitly from `loadConfig()`:
```typescript
import { loadConfig } from './config.ts';

function getClient(): OpenAI {
if (!client) {
const cfg = loadConfig();
client = new OpenAI({
apiKey: cfg?.openai_api_key ?? process.env.OPENAI_API_KEY,
});
}
return client;
}
```

This preserves env-var precedence (since `loadConfig()` already merges env over file) while making the documented config-file path actually work.

Bonus: detect and fail fast

Could also add an upfront check at the start of `embed --all` / `embed --stale` / `put` (when chunking) so users get an immediate error instead of a silently-zero "completed" run with per-page errors buried in stderr.

Environment

  • gbrain 0.18.2
  • Bun-installed via `bun install -g`
  • macOS Darwin 25.4.0 (arm64)
  • Engine: PGLite

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions