Expressive, opinionated, and intuitive 'fluent interface' Python library for working with LLMs.
Express every LLM interaction in your app prototypes in a single statement, without having to reach for documentation, looking up model capabilities, or writing boilerplate code.
- Expressive: Write natural, readable, and chainable LLM interactions.
- Opinionated: Focuses on best practices and sensible defaults for LLM workflows.
- Fluent API: Compose prompts, context, and expectations in a single chain.
- Supports multimodal (text, image, audio) inputs and outputs: Automatically picks model based on modalities required.
- Automatic coroutines Can be used both in async and sync contexts.
- Modern Python: Type hints, async/await, and dataclasses throughout.
# Simple request
$ uvx fluent-llm "llm.request('What's the capital of France?').prompt()"
Paris is the capital of France.
# Audio input
$ uvx fluent-llm "llm.audio('animal.mp3').request('What animal is this?').prompt()"
That sounds like a cat meowing.
# Image generation
$ uvx fluent-llm "llm.request('A cat wearing a top hat').prompt_for_image().show()"
# Opens image viewer with generated image
# Tool calling
$ uv run --with fluent-llm python
>>> from fluent_llm import *
>>> def get_weather(location: str) -> str:
... """Get the current weather for a location."""
... import urllib.request
... return urllib.request.urlopen(f"https://wttr.in/{location}?format=3").read().decode()
>>> llm.tool(get_weather).request("Weather in Tokyo?").prompt()
'The current weather in Tokyo is partly cloudy at 18°C.'The llm global instance can be used to build prompts, using the following mutators:
.agent(str): Sets the agent description, defines system behavior..context(str): Passes textual context to the LLM..request(str): Passes the main request to the LLM. (Identical to.context(), just used to clarify the intent.).image(PIL.Image | Path | filename str): Passes an image to the LLM..audio(soundfile.SoundFile | Path | filename str): Passes an audio file to the LLM..tool(tool_func)or.tools(tool_func1, tool_func2, ...): Registers functions as potential tool calls to offer to the LLM..assistant(str): Injects an assistant message into the conversation (useful for priming or few-shot examples).
Other mutators change the behavior of the system, e.g. .provider(), .model() and .call_limit(). We'll discuss these later.
This library supports two related, but distinct prompt building paradigms:
- One-shot prompts: you construct a prompt, send it, get a direct response in an immediately usable format (no
Response-type class). - Multi-turn conversations: construct a prompt and use it to start a conversation, then request multiple responses from the LLM (potentially including tool calls), send a follow-up prompt, etc.
To get a one-shot response, use one of the following methods:
.prompt(): str: Sends the prompt to the LLM and returns a text response..prompt_for_image(): PIL.Image: Sends the prompt to the LLM and returns an image response.- [to be implemented]
.prompt_for_audio(): soundfile.SoundFile: Sends the prompt to the LLM and returns an audio response. .prompt_for_type(pydantic_model): BaseModel: Sends the prompt to the LLM and returns a Python object instance.
They will either return the desired response if processing was successful, or raise an exception otherwise. If tools have been registered, these methods will automatically execute tool calls as needed before returning the final response.
Alternatively, begin a conversation:
.begin_conversation(): Starts a conversation with the LLM, and returns aLLMConversationinstance.
This instance implements the async generator protocol, and can be used to iterate over the responses from the LLM. If tools have been registered, tool calls will be executed automatically during iteration.
conversation = llm \
.agent("You are a helpful assistant.") \
.request("What is Python?") \
.begin_conversation()
async for message in conversation:
print(f"Assistant: {message.text}")Afterwards, you can retrieve a new builder from conversation.continuation, which you can use to follow-up with more prompts and keep the conversation going.
# Continue the conversation
follow_up = conversation.continuation \
.request("Tell me more about Python functions") \
.begin_conversation()
async for message in follow_up:
print(f"Assistant: {message.text}")The continuation system allows you to seamlessly continue conversations:
# Start a conversation
conversation = llm \
.agent("You are a math tutor.") \
.request("What is 2 + 2?") \
.begin_conversation()
async for message in conversation:
print(message.text) # "2 + 2 equals 4."
# Continue with follow-up questions
continuation = conversation.continuation \
.request("What about 3 + 3?") \
.begin_conversation()
async for message in continuation:
print(message.text) # "3 + 3 equals 6."
# Access continuation at any time during iteration
conversation = llm.request("Count to 5").begin_conversation()
count = 0
async for message in conversation:
count += 1
if count == 2:
# Stop early and continue with a different request
break
# The conversation has the partial response
follow_up = conversation.continuation \
.request("Now count backwards from 5") \
.begin_conversation()Set your API keys as environment variables:
# On Unix/macOS
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-...
# On Windows (cmd)
set OPENAI_API_KEY=sk-...
set ANTHROPIC_API_KEY=sk-...
# On Windows (PowerShell)
$env:OPENAI_API_KEY="sk-..."
$env:ANTHROPIC_API_KEY="sk-..."You can use this library as a callable module to experiment with LLMs. No installation necessary.
$ uvx fluent-llm "llm.request('1+2=?').prompt()"
1 + 2 = 3.Start an interactive Python session with the library available:
$ uv run --with=fluent-llm ipythonllm \
.agent("You are an art evaluator.") \
.context("You received this painting and were tasked to evaluate whether it's museum-worthy.") \
.image("painting.png") \
.prompt()uv add fluent-llmFor the best experience, use a star import to get everything you need:
from fluent_llm import *This gives you access to llm, the main builder instance you'll use for all interactions. For most use cases, this is all you need.
For exception handling, import from the exceptions module:
from fluent_llm.exceptions import *For structured outputs, you'll need Pydantic's BaseModel:
from pydantic import BaseModelAll examples below assume you've imported llm using one of these methods.
Just works. See if you can spot the difference to the example above.
response = await llm \
.agent("You are an art evaluator.") \
.context("You received this painting and were tasked to evaluate whether it's museum-worthy.") \
.image("painting.png") \
.prompt()response = llm \
.agent("You are a 17th century classic painter.") \
.context("You were paid 10 francs for creating a portrait.") \
.request('Create a portrait of Louis XIV.') \
.prompt_for_image()
assert isinstance(response, PIL.Image)
response.show()transcription = llm \
.audio("meeting_recording.mp3") \
.request("Transcribe this audio and summarize the key points.") \
.prompt()from pydantic import BaseModel
from datetime import datetime
class Event(BaseModel):
name: str
date: datetime
duration_hours: float
is_recurring: bool
attendees: list[str]
event = llm \
.request("Team standup every Monday at 9am for 30 minutes with Alice and Bob") \
.prompt_for_type(Event)
assert event.name == "Team standup"
assert event.attendees == ['Alice', 'Bob']
assert event.duration_hours == 0.5Register Python functions as tools using .tool() or .tools(). Functions must have type annotations and a docstring:
def get_weather(location: str) -> str:
"""Get the current weather for a location."""
return f"The weather in {location} is sunny and 22°C"
response = llm \
.agent("You are a helpful assistant.") \
.tool(get_weather) \
.request("What's the weather like in Paris?") \
.prompt()Tools are executed automatically when the LLM decides to call them. The library handles the full tool call loop—invoking your function, passing results back to the LLM, and returning the final response.
For multiple tools:
response = llm \
.tools(get_weather, search_web, calculate) \
.request("...") \
.prompt()The .assistant() method allows you to inject assistant messages into your conversation. This is useful for:
- Few-shot learning: Provide example responses to guide the model's behavior
- Conversation priming: Start with a specific assistant response
- Conversation restoration: Continue from a saved conversation state
# Few-shot example
response = llm \
.agent("You are a helpful translator.") \
.request("Translate 'hello' to French") \
.assistant("Bonjour") \
.request("Translate 'goodbye' to French") \
.prompt()
# Expected: "Au revoir"
# Priming a conversation
conversation = llm \
.agent("You are a creative writer.") \
.assistant("I'm ready to help you craft amazing stories!") \
.request("Write a short story about a robot") \
.begin_conversation()Usage tracking and price estimations for the last call are built-in.
>>> llm.request('How are you?').prompt()
"I'm doing well, thank you! How about you?"
>>> print(llm.usage)
=== Last API Call Usage ===
Model: gpt-4o-mini-2024-07-18
input_tokens: 11 tokens
output_tokens: 12 tokens
💰 Cost Breakdown:
input_tokens: 11 tokens → $0.000002
output_tokens: 12 tokens → $0.000007
💵 Total Call Cost: $0.000009
==============================
>>> llm.usage.cost.total_call_cost_usd
0.000009
>>> llm.usage.cost.breakdown['input_tokens'].count
11Fluent LLM supports model-agnostic conversation serialization, allowing you to save and restore conversations across sessions.
Use the .save(filename | Path object | stream) method on a conversation to persist it.
Use the .load_conversation(filename | Path object | stream) method on a builder to restore a conversation:
Once loaded, you can continue conversations with any configuration.
Model selections, tool call definitions, etc. are not persisted.
# Load a conversation
conversation = llm.load_conversation(requests.get(url, stream=True).raw)
# Continue with a different provider or model
continuation = conversation.continuation \
.request("Tell me more") \
.begin_conversation()
async for message in continuation:
print(message.text)
# Save the updated conversation
continuation.save(Path("conversations/my_conversation_continued.json"))You can specify preferred providers and models using the fluent chain API with the Providers and Models enums:
from fluent_llm import llm, Providers, Models
# Use a specific provider (will select best available model)
response = await llm \
.provider(Providers.ANTHROPIC) \
.request("Hello, how are you?") \
.prompt()
# Use a specific model
response = await llm \
.model(Models.HAIKU_3_5) \
.request("Write a poem about coding") \
.prompt()
# Combine provider and model preferences
response = await llm \
.provider(Providers.OPENAI) \
.model(Models.GPT_4O_MINI) \
.request("Explain quantum computing") \
.prompt()If choosing a provider or model per-invocation is not sufficient, you can define a custom ModelSelectionStrategy and pass it to the LLMPromptBuilder constructor to select provider and model based on your own criteria.
Instead of using the convenience methods .prompt_*(), you can use the .call() method to execute the prompt and return a response. The .call() method also supports a verbose=True parameter that prints a conversation trace to stderr, useful for debugging tool calls and multi-turn interactions.
Pass in a custom client to the .call() method, to use a custom client for the API call.
If the defaults are not sufficient, you can customize the behavior of the builder by creating your own LLMPromptBuilder, instead of using the llm global instance provided for convenience.
However, note that you're probably quickly reaching the point at which you should ask yourself if you're not better off using the official OpenAI Python client library directly. This library is designed to be a simple and opinionated wrapper around the OpenAI API, and it's not intended to be a full-featured LLM client.
uv sync --devInstalls all runtime and development dependencies (including pytest).
See TESTING.md for complete testing documentation.
Licensed under the MIT License.
Almost all code written by Claude, o3 and SWE-1, concept and design by @hheimbuerger.