Skip to content

hheimbuerger/fluent-llm

Repository files navigation

Fluent LLM

Expressive, opinionated, and intuitive 'fluent interface' Python library for working with LLMs.

Mission statement

Express every LLM interaction in your app prototypes in a single statement, without having to reach for documentation, looking up model capabilities, or writing boilerplate code.

Highlights

  • Expressive: Write natural, readable, and chainable LLM interactions.
  • Opinionated: Focuses on best practices and sensible defaults for LLM workflows.
  • Fluent API: Compose prompts, context, and expectations in a single chain.
  • Supports multimodal (text, image, audio) inputs and outputs: Automatically picks model based on modalities required.
  • Automatic coroutines Can be used both in async and sync contexts.
  • Modern Python: Type hints, async/await, and dataclasses throughout.

DX Examples

# Simple request
$ uvx fluent-llm "llm.request('What's the capital of France?').prompt()"
Paris is the capital of France.

# Audio input
$ uvx fluent-llm "llm.audio('animal.mp3').request('What animal is this?').prompt()"
That sounds like a cat meowing.

# Image generation
$ uvx fluent-llm "llm.request('A cat wearing a top hat').prompt_for_image().show()"
# Opens image viewer with generated image

# Tool calling
$ uv run --with fluent-llm python
>>> from fluent_llm import *
>>> def get_weather(location: str) -> str:
...     """Get the current weather for a location."""
...     import urllib.request
...     return urllib.request.urlopen(f"https://wttr.in/{location}?format=3").read().decode()
>>> llm.tool(get_weather).request("Weather in Tokyo?").prompt()
'The current weather in Tokyo is partly cloudy at 18°C.'

Constructing prompts

The llm global instance can be used to build prompts, using the following mutators:

  • .agent(str): Sets the agent description, defines system behavior.
  • .context(str): Passes textual context to the LLM.
  • .request(str): Passes the main request to the LLM. (Identical to .context(), just used to clarify the intent.)
  • .image(PIL.Image | Path | filename str): Passes an image to the LLM.
  • .audio(soundfile.SoundFile | Path | filename str): Passes an audio file to the LLM.
  • .tool(tool_func) or .tools(tool_func1, tool_func2, ...): Registers functions as potential tool calls to offer to the LLM.
  • .assistant(str): Injects an assistant message into the conversation (useful for priming or few-shot examples).

Other mutators change the behavior of the system, e.g. .provider(), .model() and .call_limit(). We'll discuss these later.

Submitting prompts

This library supports two related, but distinct prompt building paradigms:

  1. One-shot prompts: you construct a prompt, send it, get a direct response in an immediately usable format (no Response-type class).
  2. Multi-turn conversations: construct a prompt and use it to start a conversation, then request multiple responses from the LLM (potentially including tool calls), send a follow-up prompt, etc.

One-shot prompts

To get a one-shot response, use one of the following methods:

  • .prompt(): str: Sends the prompt to the LLM and returns a text response.
  • .prompt_for_image(): PIL.Image: Sends the prompt to the LLM and returns an image response.
  • [to be implemented] .prompt_for_audio(): soundfile.SoundFile: Sends the prompt to the LLM and returns an audio response.
  • .prompt_for_type(pydantic_model): BaseModel: Sends the prompt to the LLM and returns a Python object instance.

They will either return the desired response if processing was successful, or raise an exception otherwise. If tools have been registered, these methods will automatically execute tool calls as needed before returning the final response.

Multi-turn conversations

Alternatively, begin a conversation:

  • .begin_conversation(): Starts a conversation with the LLM, and returns a LLMConversation instance.

This instance implements the async generator protocol, and can be used to iterate over the responses from the LLM. If tools have been registered, tool calls will be executed automatically during iteration.

conversation = llm \
    .agent("You are a helpful assistant.") \
    .request("What is Python?") \
    .begin_conversation()

async for message in conversation:
    print(f"Assistant: {message.text}")

Afterwards, you can retrieve a new builder from conversation.continuation, which you can use to follow-up with more prompts and keep the conversation going.

# Continue the conversation
follow_up = conversation.continuation \
    .request("Tell me more about Python functions") \
    .begin_conversation()

async for message in follow_up:
    print(f"Assistant: {message.text}")

Conversation Continuation Patterns

The continuation system allows you to seamlessly continue conversations:

# Start a conversation
conversation = llm \
    .agent("You are a math tutor.") \
    .request("What is 2 + 2?") \
    .begin_conversation()

async for message in conversation:
    print(message.text)  # "2 + 2 equals 4."

# Continue with follow-up questions
continuation = conversation.continuation \
    .request("What about 3 + 3?") \
    .begin_conversation()

async for message in continuation:
    print(message.text)  # "3 + 3 equals 6."

# Access continuation at any time during iteration
conversation = llm.request("Count to 5").begin_conversation()
count = 0
async for message in conversation:
    count += 1
    if count == 2:
        # Stop early and continue with a different request
        break

# The conversation has the partial response
follow_up = conversation.continuation \
    .request("Now count backwards from 5") \
    .begin_conversation()

Setup

Setting API Keys

Set your API keys as environment variables:

# On Unix/macOS
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-...

# On Windows (cmd)
set OPENAI_API_KEY=sk-...
set ANTHROPIC_API_KEY=sk-...

# On Windows (PowerShell)
$env:OPENAI_API_KEY="sk-..."
$env:ANTHROPIC_API_KEY="sk-..."

As a tool / callable module

You can use this library as a callable module to experiment with LLMs. No installation necessary.

$ uvx fluent-llm "llm.request('1+2=?').prompt()"
1 + 2 = 3.

In the REPL

Start an interactive Python session with the library available:

$ uv run --with=fluent-llm ipython
llm \
    .agent("You are an art evaluator.") \
    .context("You received this painting and were tasked to evaluate whether it's museum-worthy.") \
    .image("painting.png") \
    .prompt()

Add to your project

uv add fluent-llm

Importing the Library

For the best experience, use a star import to get everything you need:

from fluent_llm import *

This gives you access to llm, the main builder instance you'll use for all interactions. For most use cases, this is all you need.

For exception handling, import from the exceptions module:

from fluent_llm.exceptions import *

For structured outputs, you'll need Pydantic's BaseModel:

from pydantic import BaseModel

All examples below assume you've imported llm using one of these methods.

Usage

Async/await

Just works. See if you can spot the difference to the example above.

response = await llm \
    .agent("You are an art evaluator.") \
    .context("You received this painting and were tasked to evaluate whether it's museum-worthy.") \
    .image("painting.png") \
    .prompt()

Multimodality

Images

response = llm \
    .agent("You are a 17th century classic painter.") \
    .context("You were paid 10 francs for creating a portrait.") \
    .request('Create a portrait of Louis XIV.') \
    .prompt_for_image()

assert isinstance(response, PIL.Image)
response.show()

Audio

transcription = llm \
    .audio("meeting_recording.mp3") \
    .request("Transcribe this audio and summarize the key points.") \
    .prompt()

Structured output

from pydantic import BaseModel
from datetime import datetime

class Event(BaseModel):
    name: str
    date: datetime
    duration_hours: float
    is_recurring: bool
    attendees: list[str]

event = llm \
    .request("Team standup every Monday at 9am for 30 minutes with Alice and Bob") \
    .prompt_for_type(Event)

assert event.name == "Team standup"
assert event.attendees == ['Alice', 'Bob']
assert event.duration_hours == 0.5

Tool Calls

Register Python functions as tools using .tool() or .tools(). Functions must have type annotations and a docstring:

def get_weather(location: str) -> str:
    """Get the current weather for a location."""
    return f"The weather in {location} is sunny and 22°C"

response = llm \
    .agent("You are a helpful assistant.") \
    .tool(get_weather) \
    .request("What's the weather like in Paris?") \
    .prompt()

Tools are executed automatically when the LLM decides to call them. The library handles the full tool call loop—invoking your function, passing results back to the LLM, and returning the final response.

For multiple tools:

response = llm \
    .tools(get_weather, search_web, calculate) \
    .request("...") \
    .prompt()

Assistant Message Injection

The .assistant() method allows you to inject assistant messages into your conversation. This is useful for:

  • Few-shot learning: Provide example responses to guide the model's behavior
  • Conversation priming: Start with a specific assistant response
  • Conversation restoration: Continue from a saved conversation state
# Few-shot example
response = llm \
    .agent("You are a helpful translator.") \
    .request("Translate 'hello' to French") \
    .assistant("Bonjour") \
    .request("Translate 'goodbye' to French") \
    .prompt()
# Expected: "Au revoir"

# Priming a conversation
conversation = llm \
    .agent("You are a creative writer.") \
    .assistant("I'm ready to help you craft amazing stories!") \
    .request("Write a short story about a robot") \
    .begin_conversation()

Usage tracking

Usage tracking and price estimations for the last call are built-in.

>>> llm.request('How are you?').prompt()
"I'm doing well, thank you! How about you?"

>>> print(llm.usage)
=== Last API Call Usage ===
Model: gpt-4o-mini-2024-07-18
input_tokens: 11 tokens
output_tokens: 12 tokens

💰 Cost Breakdown:
  input_tokens: 11 tokens → $0.000002
  output_tokens: 12 tokens → $0.000007

💵 Total Call Cost: $0.000009
==============================

>>> llm.usage.cost.total_call_cost_usd
0.000009

>>> llm.usage.cost.breakdown['input_tokens'].count
11

Conversation Serialization

Fluent LLM supports model-agnostic conversation serialization, allowing you to save and restore conversations across sessions.

Use the .save(filename | Path object | stream) method on a conversation to persist it.

Use the .load_conversation(filename | Path object | stream) method on a builder to restore a conversation:

Once loaded, you can continue conversations with any configuration.

Model selections, tool call definitions, etc. are not persisted.

# Load a conversation
conversation = llm.load_conversation(requests.get(url, stream=True).raw)

# Continue with a different provider or model
continuation = conversation.continuation \
    .request("Tell me more") \
    .begin_conversation()

async for message in continuation:
    print(message.text)

# Save the updated conversation
continuation.save(Path("conversations/my_conversation_continued.json"))

Provider and Model per-prompt override

You can specify preferred providers and models using the fluent chain API with the Providers and Models enums:

from fluent_llm import llm, Providers, Models

# Use a specific provider (will select best available model)
response = await llm \
    .provider(Providers.ANTHROPIC) \
    .request("Hello, how are you?") \
    .prompt()

# Use a specific model
response = await llm \
    .model(Models.HAIKU_3_5) \
    .request("Write a poem about coding") \
    .prompt()

# Combine provider and model preferences
response = await llm \
    .provider(Providers.OPENAI) \
    .model(Models.GPT_4O_MINI) \
    .request("Explain quantum computing") \
    .prompt()

Customization

Model Selection Strategy

If choosing a provider or model per-invocation is not sufficient, you can define a custom ModelSelectionStrategy and pass it to the LLMPromptBuilder constructor to select provider and model based on your own criteria.

Invocation

Instead of using the convenience methods .prompt_*(), you can use the .call() method to execute the prompt and return a response. The .call() method also supports a verbose=True parameter that prints a conversation trace to stderr, useful for debugging tool calls and multi-turn interactions.

Client

Pass in a custom client to the .call() method, to use a custom client for the API call.

Prompt Builder Override

If the defaults are not sufficient, you can customize the behavior of the builder by creating your own LLMPromptBuilder, instead of using the llm global instance provided for convenience.

However, note that you're probably quickly reaching the point at which you should ask yourself if you're not better off using the official OpenAI Python client library directly. This library is designed to be a simple and opinionated wrapper around the OpenAI API, and it's not intended to be a full-featured LLM client.

Contribution

Setup

uv sync --dev

Installs all runtime and development dependencies (including pytest).

Running Tests

See TESTING.md for complete testing documentation.

License

Licensed under the MIT License.

Disclaimer

Almost all code written by Claude, o3 and SWE-1, concept and design by @hheimbuerger.

About

Expressive, opinionated, and intuitive 'fluent interface' Python library for working with LLMs

Resources

License

Stars

Watchers

Forks

Contributors