Problem
gbrain v0.18.2 hard-codes OpenAI's text-embedding-3-large in src/core/embedding.ts. Builders who don't use OpenAI as a vendor are forced to either add OpenAI just for embeddings or run lexical-search-only mode (semantic similarity ranking and hybrid query/ask degrade to keyword matching).
This pushes builders toward vendor sprawl. Many of us run on Gemini (Google), Anthropic (Claude), or local-first stacks deliberately.
Proposal
Abstract the embedding provider behind an interface and add provider switching:
- Default: OpenAI text-embedding-3-large (1536 dims) — current behavior, no breaking change
- Add: Gemini text-embedding-004 (768 dims, free tier on Google AI Studio, generous limits)
- Add: Voyage voyage-3-lite (512 dims, Anthropic's recommended embedding partner, free tier)
- Add: Ollama nomic-embed-text (768 dims, fully local, $0 cost, privacy-preserving)
Selected via:
- gbrain init --embedder gemini
- gbrain config set embedder gemini
- Or env: GBRAIN_EMBEDDER=gemini
Schema change: store embedding model + dimensions per chunk. Allows mixed-dimension corpora during migration. Migration helper: gbrain reindex --embedder gemini --all.
Why this matters
- Removes vendor lock to OpenAI for builders who chose other AI stacks deliberately
- Lowers the barrier to entry for users who don't have OpenAI billing
- Aligns with the broader gstack ethos of "use what you have"
- Local Ollama option preserves privacy for sensitive corpora
Willing to contribute
Happy to discuss design before submitting a PR. The lift is roughly:
- Refactor src/core/embedding.ts to an EmbeddingProvider interface
- Add 3 implementations (Gemini, Voyage, Ollama)
- Schema migration to track model + dims per chunk
- Update gbrain init flow to ask which embedder
- Tests against fixture corpus per provider
Estimated ~150-200 LOC plus migration. Open to alternative designs.
Problem
gbrain v0.18.2 hard-codes OpenAI's text-embedding-3-large in src/core/embedding.ts. Builders who don't use OpenAI as a vendor are forced to either add OpenAI just for embeddings or run lexical-search-only mode (semantic similarity ranking and hybrid query/ask degrade to keyword matching).
This pushes builders toward vendor sprawl. Many of us run on Gemini (Google), Anthropic (Claude), or local-first stacks deliberately.
Proposal
Abstract the embedding provider behind an interface and add provider switching:
Selected via:
Schema change: store embedding model + dimensions per chunk. Allows mixed-dimension corpora during migration. Migration helper: gbrain reindex --embedder gemini --all.
Why this matters
Willing to contribute
Happy to discuss design before submitting a PR. The lift is roughly:
Estimated ~150-200 LOC plus migration. Open to alternative designs.