Skip to content

Feature Request: Support configurable local embedding endpoints (e.g., OpenAI-compatible API) to unlock non-English language potential #16

@unfall103-debug

Description

@unfall103-debug

Is your feature request related to a problem? Please describe.
First of all, thank you for creating such an elegant and mathematically rigorous memory system. The Mode B architecture aligns perfectly with the "data sovereignty" philosophy many of us value.

However, for non-English users (e.g., Chinese, Japanese, Arabic), the hardcoded local embedding model (bge-small-en-v1.5 via Transformers.js) becomes a significant bottleneck. While the mathematical layer (Fisher-Rao, Sheaf, Langevin) is language-agnostic, its effectiveness is entirely dependent on the quality of the input vectors. The English-optimized model struggles with non-English semantics, leading to a substantial degradation in retrieval quality, which undermines the otherwise brilliant mathematical core of the system.

Describe the solution you'd like
I propose an elegant and maintainable solution: Open the embedding layer to user configuration.

Instead of hardcoding the Transformers.js model, SuperLocalMemory could respect the embedding configuration in config.json when mode: "b" is active.

Specifically, if a user configures an OpenAI-compatible endpoint (like many of us do for the LLM), the system should use that for embeddings as well. For example:

json
"embedding": {
"provider": "openai",
"model_name": "Qwen3-Embedding",
"dimension": 1024,
"api_endpoint": "http://192.168.50.140:8045/v1/embeddings",
"api_key": "not-needed"
}
This approach requires zero additional maintenance from you for different languages. You don't need to research, package, or optimize models for every language. You simply provide the interface, and the global community of users can plug in their own best-in-class local models (e.g., BAAI/bge-m3, intfloat/multilingual-e5, etc.) that are best suited for their language and hardware.

Describe alternatives you've considered
Modifying config.json (Current Behavior): We've extensively tested this. The system currently ignores the custom embedding.provider and embedding.api_endpoint settings in Mode B. It falls back to the internal Transformers.js model and reports Ollama embedder not available (model=nomic-embed-text). Falling back. This confirms the configuration is not being honored.

Switching to Mode C: This leverages a cloud LLM for memory synthesis, which improves fact extraction but does not solve the core embedding problem. Moreover, it requires sending data to the cloud, which conflicts with the "local-first" and "data sovereignty" principles that drew many of us to this project.

Forking and Patching: This is a high-maintenance, unsustainable solution for the end-user.

Additional context
This feature request is about unlocking the true potential of your mathematical engine. By making the embedding layer configurable, SuperLocalMemory transforms from a predominantly English-focused tool into a universal, language-agnostic memory operating system. This is a massive value proposition for the global open-source community. It allows users worldwide to fully leverage your brilliant mathematical architecture without compromising on their native language support or data privacy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions