Skip to content

Feature request: Pluggable embedding providers (Gemini, Voyage, Ollama) #771

@thephilipjohnson

Description

@thephilipjohnson

Problem

gbrain v0.18.2 hard-codes OpenAI's text-embedding-3-large in src/core/embedding.ts. Builders who don't use OpenAI as a vendor are forced to either add OpenAI just for embeddings or run lexical-search-only mode (semantic similarity ranking and hybrid query/ask degrade to keyword matching).

This pushes builders toward vendor sprawl. Many of us run on Gemini (Google), Anthropic (Claude), or local-first stacks deliberately.

Proposal

Abstract the embedding provider behind an interface and add provider switching:

  • Default: OpenAI text-embedding-3-large (1536 dims) — current behavior, no breaking change
  • Add: Gemini text-embedding-004 (768 dims, free tier on Google AI Studio, generous limits)
  • Add: Voyage voyage-3-lite (512 dims, Anthropic's recommended embedding partner, free tier)
  • Add: Ollama nomic-embed-text (768 dims, fully local, $0 cost, privacy-preserving)

Selected via:

  • gbrain init --embedder gemini
  • gbrain config set embedder gemini
  • Or env: GBRAIN_EMBEDDER=gemini

Schema change: store embedding model + dimensions per chunk. Allows mixed-dimension corpora during migration. Migration helper: gbrain reindex --embedder gemini --all.

Why this matters

  • Removes vendor lock to OpenAI for builders who chose other AI stacks deliberately
  • Lowers the barrier to entry for users who don't have OpenAI billing
  • Aligns with the broader gstack ethos of "use what you have"
  • Local Ollama option preserves privacy for sensitive corpora

Willing to contribute

Happy to discuss design before submitting a PR. The lift is roughly:

  • Refactor src/core/embedding.ts to an EmbeddingProvider interface
  • Add 3 implementations (Gemini, Voyage, Ollama)
  • Schema migration to track model + dims per chunk
  • Update gbrain init flow to ask which embedder
  • Tests against fixture corpus per provider

Estimated ~150-200 LOC plus migration. Open to alternative designs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions