view() tool blocked on upstream: OWUI Python toolkit tools cannot return model-visible images

## Context

Lathe would benefit from a `view()` tool that lets the agent visually perceive an image on the sandbox filesystem — useful when driving a headless browser, inspecting generated charts, iterating on visual output, etc. The sandbox can produce the image; the problem is getting it into the model's context window.

## The upstream gap

OWUI has a clean convention for **human-facing** rich output from tools: `HTMLResponse` with `Content-Disposition: inline` renders an iframe embed, and the model gets a status string. There is no symmetric convention for **model-facing** visual output — a tool producing an image that the model needs to *see* on its next turn.

This is tracked upstream as:

- **[open-webui/open-webui#22591](https://github.com/open-webui/open-webui/discussions/22591)** — feat: Tool return convention for model-visible image content

### What partially works today

The OWUI codebase has *two* tool execution paths with different image handling:

1. **Legacy function-calling path** (`chat_completion_tools_handler`): `tool_result_files` with `type: "image"` are emitted as socket events to the frontend for display, but are **not** injected into the model's next context. The model receives only a text summary like `"Image file read successfully."`.

2. **Native tool-calling path** (Responses API / OR-aligned, around `middleware.py:4248`): For MCP tools returning `{"type": "image", ...}` content items, image data URIs **are** added as `input_image` parts in `function_call_output` items — meaning the model can see them. However, this only applies to MCP tools on the native path.

Python toolkit tools (which is what Lathe is) return plain strings. There is no return type or convention that causes OWUI to inject an image as an `image_url` content part in the model's next turn.

### Related upstream discussions

- **[open-webui/open-webui#22467](https://github.com/open-webui/open-webui/discussions/22467)** — MCP annotations for user-only output (the inverse routing problem)
- **[open-webui/open-webui#22590](https://github.com/open-webui/open-webui/discussions/22590)** — Image transcription for non-vision models (same gap, different workaround angle)

Together these describe the full routing matrix `{human, model} × {text, rich}` — three of four cells are served; the model-facing visual cell is empty.

## What this means for Lathe

A `view(path)` tool is blocked until OWUI provides a return convention that Python toolkit tools can use to deliver image data to the model. No workaround within Lathe's architecture (single-file toolkit, no OWUI storage dependency) can cleanly bridge this gap.

When the upstream feature lands, implementing `view()` in Lathe should be straightforward: download the image bytes from the sandbox via the toolbox API, base64-encode them, and return them using whatever convention OWUI establishes.

## Action

Watch open-webui/open-webui#22591. No Lathe changes needed until the upstream convention is defined.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

view() tool blocked on upstream: OWUI Python toolkit tools cannot return model-visible images #40

Context

The upstream gap

What partially works today

Related upstream discussions

What this means for Lathe

Action

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

view() tool blocked on upstream: OWUI Python toolkit tools cannot return model-visible images #40

Description

Context

The upstream gap

What partially works today

Related upstream discussions

What this means for Lathe

Action

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions