Add Text Embeddings Inference (TEI) Support
Overview
Add support for Hugging Face's Text Embeddings Inference (TEI) server as an embedding provider in the contextframe embeddings module. TEI provides high-performance, production-ready embedding inference for open-source models with optimizations like Flash Attention, dynamic batching, and specialized hardware support.
Background
Text Embeddings Inference (TEI) is a toolkit for deploying and serving text embedding models at scale. It offers significant advantages:
- Performance: Optimized inference with Flash Attention, ONNX, and hardware acceleration
- Flexibility: Supports 100+ open-source models (BERT, Sentence Transformers, etc.)
- Production-ready: Built-in metrics, monitoring, and serverless deployment options
- Self-hosted: Full control over data and infrastructure
Documentation: https://huggingface.co/docs/text-embeddings-inference/en/index
Requirements
1. TEI Provider Implementation
Create a new TEIProvider class that:
- Inherits from
EmbeddingProvider base class
- Supports both local (
http://localhost:8080) and remote TEI instances
- Handles authentication via bearer tokens
- Implements proper error handling and retries
- Returns standardized
EmbeddingResult objects
2. Minimal Dependencies
- Use lightweight HTTP client (httpx or requests)
- Add as optional dependency in
pyproject.toml
- No heavy ML libraries required (model runs on TEI server)
3. Configuration Support
- Environment variables:
TEI_API_BASE, TEI_API_KEY
- Constructor parameters:
api_base, api_key, timeout
- Model identification (for metadata, actual model is on server)
4. API Integration
Support TEI endpoints:
/embed - Standard embeddings
/info - Model information (optional)
/health - Health checks (optional)
5. Factory Integration
Update create_embedder() to support provider_type="tei"
Implementation Plan
Phase 1: Core Implementation
- Create
contextframe/embed/tei_provider.py
- Implement
TEIProvider class with basic embedding support
- Add error handling for connection issues, timeouts
- Update factory function in
batch.py
Phase 2: Configuration & Testing
- Add optional dependencies to
pyproject.toml
- Create unit tests for TEI provider
- Add integration tests (with mock TEI server)
- Document configuration options
Phase 3: Documentation & Examples
- Add TEI section to embedding providers documentation
- Create example showing TEI deployment and usage
- Add Docker Compose example for local TEI setup
- Document supported models and performance tips
Technical Details
Provider Interface
class TEIProvider(EmbeddingProvider):
def __init__(
self,
model: str,
api_key: str | None = None,
api_base: str | None = None,
timeout: float = 30.0,
truncate: bool = True,
normalize: bool = True,
):
# Implementation
Usage Example
# Local TEI server
embedder = create_embedder(
model="BAAI/bge-large-en-v1.5",
provider_type="tei"
)
# Remote TEI server with auth
embedder = create_embedder(
model="BAAI/bge-large-en-v1.5",
provider_type="tei",
api_base="https://my-tei-server.com",
api_key="my-bearer-token"
)
# Embed documents
results = embedder.embed_batch(["Text 1", "Text 2"])
Docker Deployment Example
# docker-compose.yml for local development
version: '3.8'
services:
tei:
image: ghcr.io/huggingface/text-embeddings-inference:1.7
ports:
- "8080:80"
volumes:
- ./models:/data
command: --model-id BAAI/bge-base-en-v1.5
Success Criteria
Future Enhancements
- Support for sparse embeddings (
/embed_sparse endpoint)
- Re-ranking support (
/rerank endpoint)
- Streaming for large batches
- Connection pooling for high-throughput scenarios
- Auto-discovery of TEI server capabilities
References
Add Text Embeddings Inference (TEI) Support
Overview
Add support for Hugging Face's Text Embeddings Inference (TEI) server as an embedding provider in the contextframe embeddings module. TEI provides high-performance, production-ready embedding inference for open-source models with optimizations like Flash Attention, dynamic batching, and specialized hardware support.
Background
Text Embeddings Inference (TEI) is a toolkit for deploying and serving text embedding models at scale. It offers significant advantages:
Documentation: https://huggingface.co/docs/text-embeddings-inference/en/index
Requirements
1. TEI Provider Implementation
Create a new
TEIProviderclass that:EmbeddingProviderbase classhttp://localhost:8080) and remote TEI instancesEmbeddingResultobjects2. Minimal Dependencies
pyproject.toml3. Configuration Support
TEI_API_BASE,TEI_API_KEYapi_base,api_key,timeout4. API Integration
Support TEI endpoints:
/embed- Standard embeddings/info- Model information (optional)/health- Health checks (optional)5. Factory Integration
Update
create_embedder()to supportprovider_type="tei"Implementation Plan
Phase 1: Core Implementation
contextframe/embed/tei_provider.pyTEIProviderclass with basic embedding supportbatch.pyPhase 2: Configuration & Testing
pyproject.tomlPhase 3: Documentation & Examples
Technical Details
Provider Interface
Usage Example
Docker Deployment Example
Success Criteria
EmbeddingProviderinterfaceFuture Enhancements
/embed_sparseendpoint)/rerankendpoint)References