[FEAT]: Multilingual Native Embedder #658

timothycarambat · 2024-01-30T17:51:34Z

What would you like to see?

Currently, the built-in embedder uses the ONNX all-MiniLM-L6-v2 embedder, which does okay for most use cases and is much smaller to download.

There should be support for the larger multilingual-e5-large model (ONNX HERE) for multi-lingual support.

This should not be the default, but it should be something the user can opt to select. They may have to wait for the download to completely download for the embedder change to be saved as we cannot afford the latency to download the model at runtime.

timothycarambat · 2024-01-30T17:52:40Z

Also, we don't want to pre-pack the docker image with models people may not use, so we will not be doing that in the future to keep the docker image portable enough for a reasonable size.

vlbosch · 2024-03-05T17:15:08Z

I would also like the option to add another local embeddings model, like for example BGE-M3. I tried adding it in the models-folder myself, but couldn't get it to work yet, unfortunately. Hopefully this feature can be added on the short term, so that we don't need to really on OpenAI's models for multilingual documents. Thanks in advance! :-)

oscar-7000 · 2024-03-23T18:46:00Z

bge-m3 would be nice

sweco-nlmdek · 2024-04-29T13:32:18Z

This would be a very welcome feature. i see in this thread : #645 someone tried multilingual-e5-large and it seems to help allot.

Herz3h · 2025-01-29T10:22:33Z

Is there a manual way to use a multilingual embedding model in the mean time ? or one from sentence-transformers?

timothycarambat · 2025-01-29T18:55:44Z

@Herz3h, yes, we support Ollama or LMStudio for embedder endpoints, where you can use any embedder you like to fit your use case.

Herz3h · 2025-01-30T10:22:12Z

Thanks, however neither of ollama/lmstudio seem to support sentence-transformers, is there a way to still use them? since some of them rank very high in https://huggingface.co/spaces/mteb/leaderboard

timothycarambat · 2025-01-30T18:26:08Z

Ollama and LMstudio both support embedders - you need to use the GGUF version since they cannot run tensorflow or pytorch models.

Ollamas select is certainly more limited since it relies on their registry, but you can import HF models directly into Ollama.
Example: https://huggingface.co/models?library=gguf&sort=downloads&search=embedding
Click on any of those and you will see the ability to pull it in Ollama or LMStudio.

Both of which support /embedding - so it works.

Herz3h · 2025-01-31T07:31:18Z

Oh didn't know there was an option from huggingface to pull anymodel in ollama. Thank you very much !

timothycarambat added enhancement New feature or request feature request labels Jan 30, 2024

timothycarambat mentioned this issue Mar 8, 2024

How to get the AI to source their response from multiple documents? #780

Closed

timothycarambat mentioned this issue Jan 20, 2025

[FEAT]: emedding model - choose #2992

Closed

timothycarambat mentioned this issue Feb 3, 2025

[BUG]: In query mode, return answer citations with mach rates ranging from 0% to 3%. #3076

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Multilingual Native Embedder #658

[FEAT]: Multilingual Native Embedder #658

timothycarambat commented Jan 30, 2024

timothycarambat commented Jan 30, 2024

vlbosch commented Mar 5, 2024

oscar-7000 commented Mar 23, 2024

sweco-nlmdek commented Apr 29, 2024

Herz3h commented Jan 29, 2025

timothycarambat commented Jan 29, 2025

Herz3h commented Jan 30, 2025

timothycarambat commented Jan 30, 2025 •

edited

Loading

Herz3h commented Jan 31, 2025

[FEAT]: Multilingual Native Embedder #658

[FEAT]: Multilingual Native Embedder #658

Comments

timothycarambat commented Jan 30, 2024

What would you like to see?

timothycarambat commented Jan 30, 2024

vlbosch commented Mar 5, 2024

oscar-7000 commented Mar 23, 2024

sweco-nlmdek commented Apr 29, 2024

Herz3h commented Jan 29, 2025

timothycarambat commented Jan 29, 2025

Herz3h commented Jan 30, 2025

timothycarambat commented Jan 30, 2025 • edited Loading

Herz3h commented Jan 31, 2025

timothycarambat commented Jan 30, 2025 •

edited

Loading