Skip to content

Add Text Embeddings Inference (TEI) support to embeddings module #59

@jayscambler

Description

@jayscambler

Add Text Embeddings Inference (TEI) Support

Overview

Add support for Hugging Face's Text Embeddings Inference (TEI) server as an embedding provider in the contextframe embeddings module. TEI provides high-performance, production-ready embedding inference for open-source models with optimizations like Flash Attention, dynamic batching, and specialized hardware support.

Background

Text Embeddings Inference (TEI) is a toolkit for deploying and serving text embedding models at scale. It offers significant advantages:

  • Performance: Optimized inference with Flash Attention, ONNX, and hardware acceleration
  • Flexibility: Supports 100+ open-source models (BERT, Sentence Transformers, etc.)
  • Production-ready: Built-in metrics, monitoring, and serverless deployment options
  • Self-hosted: Full control over data and infrastructure

Documentation: https://huggingface.co/docs/text-embeddings-inference/en/index

Requirements

1. TEI Provider Implementation

Create a new TEIProvider class that:

  • Inherits from EmbeddingProvider base class
  • Supports both local (http://localhost:8080) and remote TEI instances
  • Handles authentication via bearer tokens
  • Implements proper error handling and retries
  • Returns standardized EmbeddingResult objects

2. Minimal Dependencies

  • Use lightweight HTTP client (httpx or requests)
  • Add as optional dependency in pyproject.toml
  • No heavy ML libraries required (model runs on TEI server)

3. Configuration Support

  • Environment variables: TEI_API_BASE, TEI_API_KEY
  • Constructor parameters: api_base, api_key, timeout
  • Model identification (for metadata, actual model is on server)

4. API Integration

Support TEI endpoints:

  • /embed - Standard embeddings
  • /info - Model information (optional)
  • /health - Health checks (optional)

5. Factory Integration

Update create_embedder() to support provider_type="tei"

Implementation Plan

Phase 1: Core Implementation

  1. Create contextframe/embed/tei_provider.py
  2. Implement TEIProvider class with basic embedding support
  3. Add error handling for connection issues, timeouts
  4. Update factory function in batch.py

Phase 2: Configuration & Testing

  1. Add optional dependencies to pyproject.toml
  2. Create unit tests for TEI provider
  3. Add integration tests (with mock TEI server)
  4. Document configuration options

Phase 3: Documentation & Examples

  1. Add TEI section to embedding providers documentation
  2. Create example showing TEI deployment and usage
  3. Add Docker Compose example for local TEI setup
  4. Document supported models and performance tips

Technical Details

Provider Interface

class TEIProvider(EmbeddingProvider):
    def __init__(
        self,
        model: str,
        api_key: str | None = None,
        api_base: str | None = None,
        timeout: float = 30.0,
        truncate: bool = True,
        normalize: bool = True,
    ):
        # Implementation

Usage Example

# Local TEI server
embedder = create_embedder(
    model="BAAI/bge-large-en-v1.5",
    provider_type="tei"
)

# Remote TEI server with auth
embedder = create_embedder(
    model="BAAI/bge-large-en-v1.5",
    provider_type="tei",
    api_base="https://my-tei-server.com",
    api_key="my-bearer-token"
)

# Embed documents
results = embedder.embed_batch(["Text 1", "Text 2"])

Docker Deployment Example

# docker-compose.yml for local development
version: '3.8'
services:
  tei:
    image: ghcr.io/huggingface/text-embeddings-inference:1.7
    ports:
      - "8080:80"
    volumes:
      - ./models:/data
    command: --model-id BAAI/bge-base-en-v1.5

Success Criteria

  • TEI provider fully implements EmbeddingProvider interface
  • Minimal package weight increase (< 100KB)
  • Works with both local and remote TEI instances
  • Proper error handling with helpful messages
  • Documentation includes deployment examples
  • Integration tests pass with mock server
  • Performance comparable to direct model usage

Future Enhancements

  • Support for sparse embeddings (/embed_sparse endpoint)
  • Re-ranking support (/rerank endpoint)
  • Streaming for large batches
  • Connection pooling for high-throughput scenarios
  • Auto-discovery of TEI server capabilities

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions