A unified Elixir client for Large Language Models with integrated cost tracking, providing a consistent interface across multiple LLM providers.
⚠️ Alpha Quality Software: This library is in early development. APIs may change without notice until version 1.0.0 is released. Use in production at your own risk.
- Unified API: Single interface for multiple LLM providers
- Streaming Support: Real-time streaming responses with error recovery
- Cost Tracking: Automatic cost calculation for all API calls
- Token Estimation: Heuristic-based token counting for cost prediction
- Context Management: Automatic message truncation to fit model context windows
- Session Management: Built-in conversation state tracking and persistence
- Structured Outputs: Schema validation and retries via Instructor integration
- Function Calling: Unified interface for tool use across providers
- Model Discovery: Query and compare model capabilities across providers
- Error Recovery: Automatic retry with exponential backoff and stream resumption
- Mock Testing: Built-in mock adapter for testing without API calls
- Type Safety: Comprehensive typespecs and structured data
- Configurable: Flexible configuration system with multiple providers
- Extensible: Easy to add new LLM providers via adapter pattern
-
Anthropic Claude - Full support for all Claude models
- claude-opus-4-20250514 (Claude 4 Opus - most capable)
- claude-sonnet-4-20250514 (Claude 4 Sonnet - balanced)
- claude-3-7-sonnet-20250219 (Claude 3.7 Sonnet)
- claude-3-5-sonnet-20241022 (Claude 3.5 Sonnet)
- claude-3-5-haiku-20241022 (Claude 3.5 Haiku - fastest)
- claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307
-
OpenAI - Latest GPT models including reasoning models
- gpt-4.1 series (gpt-4.1, gpt-4.1-mini, gpt-4.1-nano - default)
- o1 reasoning models (o1-pro, o1, o1-mini, o1-preview)
- gpt-4o series (gpt-4o, gpt-4o-mini, gpt-4o-latest)
- gpt-4-turbo series
- gpt-3.5-turbo models
- Specialized models for audio, search, and extended output
-
Ollama - Local model runner
- Any model available in your Ollama installation
- Automatic model discovery
- No API costs
-
AWS Bedrock - Multi-provider access with comprehensive model support
- Anthropic Claude: All Claude 4, 3.7, 3.5, 3, and 2.x models
- Amazon Nova: Micro, Lite (default), Pro, Premier
- Amazon Titan: Lite, Express text models
- Meta Llama: Llama 4 (Maverick, Scout), Llama 3.3, 3.2, and 2 series
- Cohere: Command, Command Light, Command R, Command R+
- AI21 Labs: Jamba 1.5 (Large, Mini), Jamba Instruct, Jurassic 2
- Mistral: Pixtral Large 2025-02, Mistral 7B, Mixtral 8x7B
- Writer: Palmyra X4, Palmyra X5
- DeepSeek: DeepSeek R1
-
Google Gemini - Gemini models with multimodal support
- gemini-2.5-pro series (experimental advanced models)
- gemini-2.0-flash series (fast multimodal)
- gemini-1.5-pro and gemini-1.5-flash (1M+ context)
- gemini-pro and gemini-pro-vision
- Specialized models for image generation and TTS
-
OpenRouter - Access to 300+ models from multiple providers
- Claude, GPT-4, Llama, PaLM, and many more
- Unified API for different model architectures
- Cost-effective access to premium models
- Automatic model discovery
-
Groq - Fast inference platform
- Llama 4 Scout (17B), Llama 3.3 (70B), Llama 3.1 and 3 series
- DeepSeek R1 Distill (default - 70B reasoning model)
- QwQ-32B (reasoning model)
- Mixtral 8x7B, Gemma series
- Mistral Saba and specialized models
- Optimized for ultra-low latency inference
-
X.AI - Grok models with advanced capabilities
- grok-beta (131K context)
- grok-2 and grok-2-vision models
- grok-3 models with reasoning support
- Web search and tool use capabilities
- Vision support on select models
-
Local Models via Bumblebee/EXLA
- microsoft/phi-2 (default)
- meta-llama/Llama-2-7b-hf
- mistralai/Mistral-7B-v0.1
- EleutherAI/gpt-neo-1.3B
- google/flan-t5-base
-
Mock Adapter - For testing and development
- Configurable responses
- Error simulation
- Request capture
- No API calls needed
Add ex_llm
to your list of dependencies in mix.exs
:
def deps do
[
{:ex_llm, "~> 0.2.0"},
# Optional: For local model support
{:bumblebee, "~> 0.5", optional: true},
{:nx, "~> 0.7", optional: true},
{:exla, "~> 0.7", optional: true}
]
end
Configure your LLM providers in config/config.exs
:
config :ex_llm,
anthropic: [
api_key: System.get_env("ANTHROPIC_API_KEY"),
base_url: "https://api.anthropic.com"
],
openai: [
api_key: System.get_env("OPENAI_API_KEY"),
base_url: "https://api.openai.com"
],
xai: [
api_key: System.get_env("XAI_API_KEY"),
base_url: "https://api.x.ai"
],
ollama: [
base_url: "http://localhost:11434"
],
bedrock: [
# AWS credentials (optional - uses credential chain by default)
access_key_id: System.get_env("AWS_ACCESS_KEY_ID"),
secret_access_key: System.get_env("AWS_SECRET_ACCESS_KEY"),
region: System.get_env("AWS_REGION") || "us-east-1",
model: "nova-lite" # Default model (cost-effective)
],
gemini: [
api_key: System.get_env("GEMINI_API_KEY"),
base_url: "https://generativelanguage.googleapis.com"
],
openrouter: [
api_key: System.get_env("OPENROUTER_API_KEY"),
base_url: "https://openrouter.ai/api/v1"
]
# Simple chat completion with automatic cost tracking
messages = [
%{role: "user", content: "Hello, how are you?"}
]
{:ok, response} = ExLLM.chat(:anthropic, messages)
IO.puts(response.content)
IO.puts("Cost: #{ExLLM.format_cost(response.cost.total_cost)}")
# Using local models (no API costs!)
{:ok, response} = ExLLM.chat(:local, messages, model: "microsoft/phi-2")
IO.puts(response.content)
# Using OpenRouter for access to many models
{:ok, response} = ExLLM.chat(:openrouter, messages, model: "openai/gpt-4o-mini")
IO.puts(response.content)
# Streaming chat with error recovery
ExLLM.stream_chat(:anthropic, messages,
stream_recovery: true,
fn chunk ->
IO.write(chunk.content)
end
)
# Using mock adapter for testing
{:ok, response} = ExLLM.chat(:mock, messages,
mock_response: "This is a test response"
)
# Estimate tokens before making a request
tokens = ExLLM.estimate_tokens(messages)
IO.puts("Estimated tokens: #{tokens}")
# Calculate cost for specific usage
usage = %{input_tokens: 1000, output_tokens: 500}
cost = ExLLM.calculate_cost(:openai, "gpt-4", usage)
IO.puts("Total cost: #{ExLLM.format_cost(cost.total_cost)}")
# With custom options
options = [
model: "claude-3-5-sonnet-20241022",
max_tokens: 1000,
temperature: 0.7,
retry_count: 3, # Automatic retry with exponential backoff
retry_delay: 1000 # Initial retry delay in ms
]
{:ok, response} = ExLLM.chat(:anthropic, messages, options)
# Function calling
functions = [
%{
name: "get_weather",
description: "Get the current weather for a location",
parameters: %{
type: "object",
properties: %{
location: %{type: "string", description: "City, State or Country"},
unit: %{type: "string", enum: ["celsius", "fahrenheit"], description: "Temperature unit"}
},
required: ["location"]
}
}
]
{:ok, response} = ExLLM.chat(:anthropic,
[%{role: "user", content: "What's the weather in Paris, France?"}],
functions: functions
)
# Parse and execute function calls
case ExLLM.parse_function_calls(response) do
{:ok, [call | _]} ->
# Execute the function
result = get_weather(call.arguments.location, call.arguments[:unit] || "celsius")
# Format the result for the conversation
function_message = ExLLM.format_function_result(call.name, result)
:none ->
# No function calls in response
end
# Model discovery and recommendations
{:ok, models} = ExLLM.list_models(:anthropic)
Enum.each(models, &IO.puts(&1.name))
# Find models with specific capabilities
vision_models = ExLLM.find_models_with_features([:vision])
function_models = ExLLM.find_models_with_features([:function_calling, :streaming])
# Get model recommendations
recommended = ExLLM.recommend_models(%{
provider: :anthropic,
min_context_window: 100_000,
required_features: [:function_calling],
preferred_features: [:vision],
max_cost_per_million_tokens: 15.0
})
# Compare models
comparison = ExLLM.compare_models([
{:anthropic, "claude-3-5-sonnet-20241022"},
{:openai, "gpt-4-turbo"},
{:gemini, "gemini-pro"}
])
# Provider capabilities - find providers by features
{:ok, caps} = ExLLM.get_provider_capabilities(:openai)
IO.puts("Endpoints: #{Enum.join(caps.endpoints, ", ")}")
# => "Endpoints: chat, embeddings, images, audio, completions, fine_tuning, files"
# Find providers with specific features
providers = ExLLM.find_providers_with_features([:embeddings, :streaming])
# => [:openai, :ollama]
# Get provider recommendations
recommendations = ExLLM.recommend_providers(%{
required_features: [:vision, :streaming],
preferred_features: [:audio_input, :function_calling],
prefer_local: false
})
# => [
# %{provider: :openai, score: 0.95, matched_features: [...], missing_features: []},
# %{provider: :anthropic, score: 0.80, matched_features: [...], missing_features: [:audio_input]}
# ]
# Context management - automatically truncate long conversations
long_conversation = [
%{role: "system", content: "You are a helpful assistant."},
# ... many messages ...
%{role: "user", content: "What's the weather?"}
]
# Automatically truncates to fit model's context window
{:ok, response} = ExLLM.chat(:anthropic, long_conversation,
max_tokens: 4000, # Max tokens for context
strategy: :smart # Preserve system messages and recent context
)
# Create a new conversation session
session = ExLLM.new_session(:anthropic, name: "Customer Support")
# Chat with automatic session tracking
{:ok, {response, session}} = ExLLM.chat_with_session(session, "Hello!")
IO.puts(response.content)
# Continue the conversation
{:ok, {response, session}} = ExLLM.chat_with_session(session, "What can you help me with?")
# Session automatically tracks:
# - Message history
# - Token usage
# - Conversation context
# Review session details
messages = ExLLM.get_session_messages(session)
total_tokens = ExLLM.session_token_usage(session)
IO.puts("Total tokens used: #{total_tokens}")
# Save session for later
{:ok, json} = ExLLM.save_session(session)
File.write!("session.json", json)
# Load session later
{:ok, session} = ExLLM.load_session(File.read!("session.json"))
chat/3
- Send messages and get a complete responsestream_chat/3
- Send messages and stream the responseconfigured?/2
- Check if a provider is properly configuredlist_models/2
- Get available models for a providerprepare_messages/2
- Prepare messages for context windowvalidate_context/2
- Validate messages fit within context windowcontext_window_size/2
- Get context window size for a modelcontext_stats/1
- Get statistics about message context usage
new_session/2
- Create a new conversation sessionchat_with_session/3
- Chat with automatic session trackingadd_session_message/4
- Add a message to a sessionget_session_messages/2
- Retrieve messages from a sessionsession_token_usage/1
- Get total token usage for a sessionclear_session/1
- Clear messages while preserving metadatasave_session/1
- Serialize session to JSONload_session/1
- Load session from JSON
parse_function_calls/2
- Parse function calls from LLM responseexecute_function/2
- Execute a function call with validationformat_function_result/2
- Format function result for conversation
get_model_info/2
- Get complete capability information for a modelmodel_supports?/3
- Check if a model supports a specific featurefind_models_with_features/1
- Find models that support specific featurescompare_models/1
- Compare capabilities across multiple modelsrecommend_models/1
- Get model recommendations based on requirementsmodels_by_capability/1
- Get models grouped by capability supportlist_model_features/0
- List all trackable model features
get_provider_capabilities/1
- Get API-level capabilities for a providerprovider_supports?/2
- Check if a provider supports a feature/endpointfind_providers_with_features/1
- Find providers that support specific featurescompare_providers/1
- Compare capabilities across multiple providersrecommend_providers/1
- Get provider recommendations based on requirementslist_providers/0
- List all available providersis_local_provider?/1
- Check if a provider runs locallyprovider_requires_auth?/1
- Check if a provider requires authentication
resume_stream/2
- Resume a previously interrupted streamlist_recoverable_streams/0
- List all recoverable streams
%ExLLM.Types.LLMResponse{
content: "Hello! I'm doing well, thank you for asking.",
usage: %{input_tokens: 12, output_tokens: 15},
model: "claude-3-5-sonnet-20241022",
finish_reason: "end_turn",
cost: %{
total_cost: 0.000261,
input_cost: 0.000036,
output_cost: 0.000225,
currency: "USD"
}
}
%ExLLM.Types.StreamChunk{
content: "Hello",
delta: true,
finish_reason: nil
}
%ExLLM.Types.Model{
name: "claude-3-5-sonnet-20241022",
provider: :anthropic,
context_length: 200000,
supports_streaming: true
}
ExLLM uses external YAML configuration files for model metadata, pricing, and capabilities. This allows easy updates without code changes:
# config/models/anthropic.yml
provider: anthropic
default_model: "claude-sonnet-4-20250514"
models:
claude-3-5-sonnet-20241022:
context_window: 200000
pricing:
input: 3.00 # per 1M tokens
output: 15.00
capabilities:
- streaming
- function_calling
- vision
# Get model pricing
pricing = ExLLM.ModelConfig.get_pricing(:anthropic, "claude-3-5-sonnet-20241022")
# Get context window
context = ExLLM.ModelConfig.get_context_window(:openai, "gpt-4o")
# Get default model for provider
default = ExLLM.ModelConfig.get_default_model(:openrouter)
# Configuration is cached for performance
# Updates require restart or cache refresh
ExLLM automatically tracks costs for all API calls using the external pricing configuration:
{:ok, response} = ExLLM.chat(:anthropic, messages)
# Access cost information
if response.cost do
IO.puts("Input tokens: #{response.cost.input_tokens}")
IO.puts("Output tokens: #{response.cost.output_tokens}")
IO.puts("Total cost: #{ExLLM.format_cost(response.cost.total_cost)}")
end
# Estimate tokens before making a request
messages = [
%{role: "system", content: "You are a helpful assistant."},
%{role: "user", content: "Explain quantum computing in simple terms."}
]
estimated_tokens = ExLLM.estimate_tokens(messages)
# Use this to predict costs before making the actual API call
# Compare costs across different providers
usage = %{input_tokens: 1000, output_tokens: 2000}
providers = [
{:openai, "gpt-4"},
{:openai, "gpt-3.5-turbo"},
{:anthropic, "claude-3-5-sonnet-20241022"},
{:anthropic, "claude-3-haiku-20240307"}
]
Enum.each(providers, fn {provider, model} ->
cost = ExLLM.calculate_cost(provider, model, usage)
unless cost[:error] do
IO.puts("#{provider}/#{model}: #{ExLLM.format_cost(cost.total_cost)}")
end
end)
ExLLM includes pricing data (as of January 2025) in external YAML files for all supported providers:
- Anthropic: Claude 3 series (Opus, Sonnet, Haiku), Claude 3.5, Claude 4
- OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo, GPT-4o series
- OpenRouter: 300+ models with dynamic pricing
- Google Gemini: Pro, Ultra, Nano
- AWS Bedrock: Various models including Claude, Titan, Llama 2
- Ollama: Local models (free - $0.00)
- Local Models: Free ($0.00) - no API costs
Pricing data is stored in config/models/*.yml
files and can be updated independently of code changes.
ExLLM automatically manages context windows to ensure your messages fit within model limits:
# Long conversation that might exceed context window
messages = [
%{role: "system", content: "You are a helpful assistant."},
# ... hundreds of messages ...
%{role: "user", content: "What's my current task?"}
]
# ExLLM automatically truncates to fit the model's context window
{:ok, response} = ExLLM.chat(:anthropic, messages)
# Check if messages fit within context window
case ExLLM.validate_context(messages, model: "gpt-3.5-turbo") do
{:ok, token_count} ->
IO.puts("Messages use #{token_count} tokens")
{:error, {:context_too_large, %{tokens: tokens, max_tokens: max}}} ->
IO.puts("Messages too large: #{tokens} tokens (max: #{max})")
end
# Sliding window (default) - keeps most recent messages
{:ok, response} = ExLLM.chat(:anthropic, messages,
max_tokens: 4000,
strategy: :sliding_window
)
# Smart strategy - preserves system messages and recent context
{:ok, response} = ExLLM.chat(:anthropic, messages,
max_tokens: 4000,
strategy: :smart,
preserve_messages: 10 # Always keep last 10 messages
)
# Get detailed statistics about your messages
stats = ExLLM.context_stats(messages)
IO.inspect(stats)
# %{
# message_count: 150,
# total_tokens: 45000,
# by_role: %{"system" => 1, "user" => 75, "assistant" => 74},
# avg_tokens_per_message: 300
# }
# Check context window sizes
IO.puts(ExLLM.context_window_size(:anthropic, "claude-3-5-sonnet-20241022"))
# => 200000
ExLLM includes built-in session management for maintaining conversation state:
# Create a new session
session = ExLLM.new_session(:anthropic, name: "My Chat")
# Chat with automatic session tracking
{:ok, {response, updated_session}} = ExLLM.chat_with_session(session, "Hello!")
# Continue the conversation
{:ok, {response2, session2}} = ExLLM.chat_with_session(updated_session, "What's 2+2?")
# Access session messages
messages = ExLLM.get_session_messages(session2)
# => [%{role: "user", content: "Hello!"}, %{role: "assistant", content: "..."}, ...]
# Save session to disk
{:ok, path} = ExLLM.save_session(session, "/path/to/sessions")
# Load session from disk
{:ok, loaded_session} = ExLLM.load_session("/path/to/sessions/session_id.json")
# Export session as markdown
{:ok, markdown} = ExLLM.export_session_markdown(session)
File.write!("conversation.md", markdown)
# Get session metadata
info = ExLLM.session_info(session)
# => %{
# id: "123...",
# name: "My Chat",
# created_at: ~U[2025-01-24 10:00:00Z],
# message_count: 10,
# total_tokens: 1500
# }
# Get token usage for session
tokens = ExLLM.session_token_usage(session)
# => 1500
# Clear session messages
clean_session = ExLLM.clear_session(session)
ExLLM integrates with instructor_ex to provide structured output validation. This allows you to define expected response structures using Ecto schemas and automatically validate LLM responses.
Instructor is included as a dependency of ExLLM, so no additional installation is needed.
# Define your schema
defmodule EmailClassification do
use Ecto.Schema
use Instructor.Validator
@llm_doc "Classification of an email as spam or not spam"
@primary_key false
embedded_schema do
field :classification, Ecto.Enum, values: [:spam, :not_spam]
field :confidence, :float
field :reason, :string
end
@impl true
def validate_changeset(changeset) do
changeset
|> Ecto.Changeset.validate_required([:classification, :confidence, :reason])
|> Ecto.Changeset.validate_number(:confidence,
greater_than_or_equal_to: 0.0,
less_than_or_equal_to: 1.0
)
end
end
# Use with ExLLM
messages = [%{role: "user", content: "Is this spam? 'You won a million dollars!'"}]
{:ok, result} = ExLLM.chat(:anthropic, messages,
response_model: EmailClassification,
max_retries: 3 # Automatically retry on validation errors
)
IO.inspect(result)
# %EmailClassification{
# classification: :spam,
# confidence: 0.95,
# reason: "Classic lottery scam pattern"
# }
# Define expected structure without Ecto
response_model = %{
name: :string,
age: :integer,
email: :string,
tags: {:array, :string}
}
messages = [%{role: "user", content: "Extract: John Doe, 30 years old, [email protected], likes elixir and coding"}]
{:ok, result} = ExLLM.chat(:anthropic, messages,
response_model: response_model
)
IO.inspect(result)
# %{
# name: "John Doe",
# age: 30,
# email: "[email protected]",
# tags: ["elixir", "coding"]
# }
defmodule UserProfile do
use Ecto.Schema
use Instructor.Validator
@llm_doc """
User profile extraction from text.
Extract all available information about the user.
"""
embedded_schema do
field :name, :string
field :email, :string
field :age, :integer
field :location, :string
embeds_many :interests, Interest do
field :name, :string
field :level, Ecto.Enum, values: [:beginner, :intermediate, :expert]
end
end
@impl true
def validate_changeset(changeset) do
changeset
|> Ecto.Changeset.validate_required([:name])
|> Ecto.Changeset.validate_format(:email, ~r/@/)
|> Ecto.Changeset.validate_number(:age, greater_than: 0, less_than: 150)
end
end
# Complex extraction with nested structures
text = """
Hi, I'm Jane Smith, a 28-year-old software engineer from Seattle.
You can reach me at [email protected]. I'm an expert in Elixir,
intermediate in Python, and just starting to learn Rust.
"""
{:ok, profile} = ExLLM.chat(:anthropic,
[%{role: "user", content: "Extract user profile: #{text}"}],
response_model: UserProfile,
max_retries: 3
)
# Direct usage of ExLLM.Instructor
{:ok, result} = ExLLM.Instructor.chat(:anthropic, messages,
response_model: EmailClassification,
max_retries: 3,
temperature: 0.1 # Lower temperature for more consistent structure
)
# Parse an existing response
{:ok, response} = ExLLM.chat(:anthropic, messages)
{:ok, structured} = ExLLM.Instructor.parse_response(response, UserProfile)
# Check if instructor is available
if ExLLM.Instructor.available?() do
# Use structured outputs
else
# Fall back to regular parsing
end
Structured outputs work with providers that have instructor adapters:
:anthropic
- Anthropic Claude:openai
- OpenAI GPT models:ollama
- Local Ollama models:gemini
- Google Gemini:bedrock
- AWS Bedrock models:local
- Local Bumblebee models
case ExLLM.chat(:anthropic, messages, response_model: UserProfile) do
{:ok, profile} ->
# Successfully validated structure
IO.inspect(profile)
{:error, {:validation_failed, errors}} ->
# Validation failed after retries
IO.inspect(errors)
{:error, reason} ->
# Other error
IO.inspect(reason)
end
ExLLM supports multiple configuration providers:
# Uses ExLLM.ConfigProvider.Default
# Reads from application config and environment variables
config = %{
anthropic: [
api_key: "your-api-key",
base_url: "https://api.anthropic.com"
]
}
ExLLM.set_config_provider({ExLLM.ConfigProvider.Static, config})
defmodule MyConfigProvider do
@behaviour ExLLM.ConfigProvider
@impl true
def get_config(provider, key) do
# Your custom logic here
end
@impl true
def has_config?(provider) do
# Your custom logic here
end
end
ExLLM.set_config_provider(MyConfigProvider)
ExLLM uses consistent error patterns:
case ExLLM.chat(:anthropic, messages) do
{:ok, response} ->
# Success
IO.puts(response.content)
{:error, {:config_error, reason}} ->
# Configuration issue
IO.puts("Config error: #{reason}")
{:error, {:api_error, %{status: status, body: body}}} ->
# API error
IO.puts("API error #{status}: #{body}")
{:error, {:network_error, reason}} ->
# Network issue
IO.puts("Network error: #{reason}")
{:error, {:parse_error, reason}} ->
# Response parsing issue
IO.puts("Parse error: #{reason}")
end
ExLLM includes automatic error recovery and retry mechanisms:
# Configure retry behavior
options = [
retry_count: 3, # Number of retry attempts
retry_delay: 1000, # Initial delay in milliseconds
retry_backoff: :exponential, # Backoff strategy
retry_jitter: true # Add jitter to prevent thundering herd
]
{:ok, response} = ExLLM.chat(:anthropic, messages, options)
# Provider-specific retry policies
ExLLM.Retry.with_retry(fn ->
ExLLM.chat(:anthropic, messages)
end,
max_attempts: 5,
initial_delay: 500,
max_delay: 30_000,
should_retry: fn error ->
# Custom retry logic
case error do
{:api_error, %{status: 429}} -> true # Rate limit
{:api_error, %{status: 503}} -> true # Service unavailable
{:network_error, _} -> true # Network issues
_ -> false
end
end
)
# Enable automatic stream recovery
{:ok, stream_id} = ExLLM.stream_chat(:anthropic, messages,
stream_recovery: true,
recovery_strategy: :paragraph, # :exact, :paragraph, or :summarize
fn chunk ->
IO.write(chunk.content)
end
)
# If stream is interrupted, resume from where it left off
case ExLLM.resume_stream(stream_id) do
{:ok, resumed_stream} ->
for chunk <- resumed_stream do
IO.write(chunk.content)
end
{:error, :not_found} ->
# Stream not recoverable
end
# List recoverable streams
recoverable = ExLLM.list_recoverable_streams()
The mock adapter allows you to test your LLM interactions without making real API calls:
# Configure static mock response
{:ok, response} = ExLLM.chat(:mock, messages,
mock_response: "This is a mock response"
)
# Configure mock with usage data
{:ok, response} = ExLLM.chat(:mock, messages,
mock_response: %{
content: "Mock response with usage",
usage: %{input_tokens: 10, output_tokens: 20},
model: "mock-model"
}
)
# Mock streaming responses
ExLLM.stream_chat(:mock, messages,
mock_chunks: ["Hello", " from", " mock", " adapter!"],
chunk_delay: 100, # Delay between chunks in ms
fn chunk ->
IO.write(chunk.content)
end
)
# Dynamic mock responses based on input
mock_handler = fn messages ->
last_message = List.last(messages)
cond do
String.contains?(last_message.content, "weather") ->
"It's sunny and 72°F"
String.contains?(last_message.content, "hello") ->
"Hello! How can I help you?"
true ->
"I don't understand"
end
end
{:ok, response} = ExLLM.chat(:mock, messages,
mock_handler: mock_handler
)
# Simulate errors
{:error, {:api_error, %{status: 429, body: "Rate limit exceeded"}}} =
ExLLM.chat(:mock, messages,
mock_error: {:api_error, %{status: 429, body: "Rate limit exceeded"}}
)
# Capture requests for assertions
{:ok, response} = ExLLM.chat(:mock, messages,
capture_requests: true,
mock_response: "Test response"
)
# Access captured requests
captured = ExLLM.Adapters.Mock.get_captured_requests()
assert length(captured) == 1
assert List.first(captured).messages == messages
defmodule MyApp.LLMClientTest do
use ExUnit.Case
setup do
# Clear any previous captures
ExLLM.Adapters.Mock.clear_captured_requests()
:ok
end
test "handles weather queries" do
messages = [%{role: "user", content: "What's the weather?"}]
{:ok, response} = ExLLM.chat(:mock, messages,
mock_response: "It's sunny today!",
capture_requests: true
)
assert response.content == "It's sunny today!"
# Verify the request
[request] = ExLLM.Adapters.Mock.get_captured_requests()
assert request.provider == :mock
assert request.messages == messages
end
test "simulates API errors" do
messages = [%{role: "user", content: "Hello"}]
{:error, error} = ExLLM.chat(:mock, messages,
mock_error: {:network_error, :timeout}
)
assert error == {:network_error, :timeout}
end
end
ExLLM supports running models locally using Bumblebee and EXLA/EMLX backends. This enables on-device inference without API calls or costs.
- Add optional dependencies to your
mix.exs
:
def deps do
[
{:ex_llm, "~> 0.2.0"},
{:bumblebee, "~> 0.5"},
{:nx, "~> 0.7"},
{:exla, "~> 0.7"} # or {:emlx, "~> 0.1"} for Apple Silicon
]
end
- Configure EXLA backend (optional - auto-detected by default):
# For CUDA GPUs
config :nx, :default_backend, {EXLA.Backend, client: :cuda}
# For Apple Silicon
config :nx, :default_backend, EMLX.Backend
- microsoft/phi-2 - Phi-2 (2.7B parameters) - Default
- meta-llama/Llama-2-7b-hf - Llama 2 (7B)
- mistralai/Mistral-7B-v0.1 - Mistral (7B)
- EleutherAI/gpt-neo-1.3B - GPT-Neo (1.3B)
- google/flan-t5-base - Flan-T5 Base
# Start the model loader (happens automatically on first use)
{:ok, _} = ExLLM.Local.ModelLoader.start_link()
# Use a local model
messages = [
%{role: "user", content: "Explain quantum computing in simple terms"}
]
{:ok, response} = ExLLM.chat(:local, messages, model: "microsoft/phi-2")
IO.puts(response.content)
# Stream responses
{:ok, stream} = ExLLM.stream_chat(:local, messages)
for chunk <- stream do
IO.write(chunk.content)
end
# List available models
{:ok, models} = ExLLM.list_models(:local)
Enum.each(models, fn model ->
IO.puts("#{model.name} - Context: #{model.context_window} tokens")
end)
# Check acceleration info
info = ExLLM.Local.EXLAConfig.acceleration_info()
IO.puts("Running on: #{info.name}")
ExLLM automatically detects and uses available hardware acceleration:
- Apple Silicon - Uses Metal via EMLX
- NVIDIA GPUs - Uses CUDA via EXLA
- AMD GPUs - Uses ROCm via EXLA
- CPUs - Optimized multi-threaded inference
- First Load: Models are downloaded from HuggingFace on first use and cached locally
- Memory: Ensure you have enough RAM/VRAM for your chosen model
- Batch Size: Automatically optimized based on available memory
- Mixed Precision: Enabled by default for better performance
# Pre-load a model
{:ok, _} = ExLLM.Local.ModelLoader.load_model("microsoft/phi-2")
# Load from local path
{:ok, _} = ExLLM.Local.ModelLoader.load_model("/path/to/model")
# Unload to free memory
:ok = ExLLM.Local.ModelLoader.unload_model("microsoft/phi-2")
# List loaded models
loaded = ExLLM.Local.ModelLoader.list_loaded_models()
To add a new LLM provider, implement the ExLLM.Adapter
behaviour:
defmodule ExLLM.Adapters.MyProvider do
@behaviour ExLLM.Adapter
@impl true
def chat(messages, options) do
# Implement chat completion
end
@impl true
def stream_chat(messages, options, callback) do
# Implement streaming chat
end
@impl true
def configured?() do
# Check if provider is configured
end
@impl true
def list_models() do
# Return available models
end
end
Then register it in the main ExLLM module.
- Elixir ~> 1.14
- Erlang/OTP ~> 25.0
- For local models (optional):
- Bumblebee ~> 0.5
- Nx ~> 0.7
- EXLA ~> 0.7 (for GPU acceleration)
- EMLX ~> 0.1 (for Apple Silicon)
# Clone the repository
git clone https://github.com/azmaveth/ex_llm.git
cd ex_llm
# Install dependencies
mix deps.get
mix deps.compile
# Run tests
mix test
# Run quality checks
mix format --check-formatted
mix credo
mix dialyzer
# Run all tests
mix test
# Run specific test files
mix test test/ex_llm_test.exs
# Run only integration tests
mix test test/*_integration_test.exs
# Run tests with coverage
mix test --cover
# Generate docs
mix docs
# Open in browser
open doc/index.html
Visit the GitHub repository to see the detailed roadmap and progress tracking.
- OpenAI adapter implementation
- Ollama adapter implementation
- AWS Bedrock adapter with multi-provider support
- Google Gemini adapter
- Structured outputs via Instructor integration
- Comprehensive cost tracking across all providers
- Function calling support for compatible models
- Request retry logic with exponential backoff
- Enhanced streaming error recovery
- Mock adapter for testing
- Model capability discovery and recommendations
- Vision/multimodal support for compatible models
- Embeddings API support
- Enhanced streaming with token-level callbacks
- Response caching with configurable TTL
- Fine-tuning management
- Batch API support
- Prompt template management
- Usage analytics and reporting
- Become the go-to LLM client library for Elixir
- Support all major LLM providers
- Provide best-in-class developer experience
- Maintain comprehensive documentation
- Build a thriving ecosystem of extensions
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Add tests for new functionality
- Ensure all tests pass (
mix test
) - Format your code (
mix format
) - Run linter (
mix credo
) - Commit your changes (
git commit -m 'feat: add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
We use Conventional Commits:
feat:
for new featuresfix:
for bug fixesdocs:
for documentation changeschore:
for maintenance taskstest:
for test additions/changes
ExLLM includes pre-configured model data for 49 additional providers, ready for implementation:
Major Cloud Providers: Azure, Vertex AI, Databricks, Sagemaker, Watsonx, Snowflake
AI Companies: Mistral AI, Cohere, Together AI, Replicate, Perplexity, DeepSeek, XAI
Inference Platforms: Fireworks AI, DeepInfra, Anyscale, Cloudflare, NScale, SambaNova
Specialized: AI21, NLP Cloud, Aleph Alpha, Voyage (embeddings), Assembly AI (audio)
All model configurations including pricing, context windows, and capabilities are already available in config/models/
.
- Built with Req for HTTP client functionality
- Local model support via Bumblebee
- Structured outputs via Instructor
- Model configuration data synced from LiteLLM
- Inspired by the need for a unified LLM interface in Elixir
MIT License - see LICENSE for details.