feat: add builtin `file_read` tool for multimodal files by khj809 · Pull Request #393 · brekkylab/ailoy

khj809 · 2026-04-29T13:55:28Z

Summary

Adds a file_read builtin tool that lets the LLM autonomously read files from the runenv(both local and sandbox) — both text and image files — within the agent loop. This enables multimodal file analysis (charts, screenshots, photos) without the caller needing to pre-load content into the message history.
The implementation mostly follows Claude Code's FileReadTool with limited file types.

New: `BuiltinToolProvider::FileRead` / `file_read` tool (`src/tool_impl/builtins/file_read.rs`)

Detects file type by extension: PNG/JPEG/GIF/WEBP return Part::Image (embedded base64); all other readable files return Part::Text
Unsupported binary types (PDF, ZIP, EXE, WASM, …) return a structured error
Non-UTF-8 text files return a structured error
Supports offset and limit parameters for windowed text reads (character-based)
Reads via RunEnv::read() so it works in both local and sandbox environments
Registered as BuiltinToolProvider::FileRead {} in src/tool/provider.rs

Provider marshal fixes for `Part::Image` in `Role::Tool` messages

Previously all providers only used contents[0] of a tool-result message and only as plain text/JSON. Now each provider correctly handles images in tool results:

Provider	Fix
Anthropic (`anthropic.rs`)	`tool_result.content` is now the full `contents` array; `Part::Value` is wrapped as `{"type":"text","text":"..."}` content block (required by the Anthropic API)
Gemini (`gemini.rs`)	Images go as sibling `inline_data` parts in the outer `parts` array alongside `functionResponse` — the `FunctionResponse` proto has no `parts` field, so nested `functionResponse.parts` was wrong
OpenAI Responses (`openai.rs`)	`function_call_output.output` becomes a content array with `input_image` blocks; `image_url` is a plain string (not `{"url":"..."}` object)
ChatCompletion (`chat_completion.rs`)	Images in tool results are substituted with `[image: mime_type]` / `[image at url]` text placeholders, since ChatCompletion backends don't reliably accept inline images in tool results

Gemini thinking model support (`gemini.rs`)

Gemini 3 thinking models return a thoughtSignature that must be replayed verbatim in subsequent turns. Added:

Capture thoughtSignature from thought text parts and from functionCall parts (sibling field) during unmarshal
Include thoughtSignature back in thought text parts and alongside functionCall parts during marshal
marshal_messages now always passes include_thinking = true so signatures are never dropped mid-conversation

Shared utility (`src/util/truncate.rs`)

Extracted middle_truncate into src/util/truncate.rs, removing local copies from bash.rs and sandbox.rs.

Test plan

cargo test --lib tool_impl::builtins::file_read — 8 unit tests covering text, image, offset/limit, UTF-8 error, unsupported extension, missing path, nonexistent path
cargo test --lib lang_model_impl::api::gemini::tests::test_tool_result_with_image — 2-turn Gemini 3 interaction: step 1 gets real functionCall + thoughtSignature, step 2 sends JPEG tool result and verifies text response (model correctly describes Jensen Huang photo)
cargo test --lib lang_model_impl::api — all 13 provider unit tests pass; test_tool_result_with_image for Anthropic and OpenAI skipped without API keys (require compile-time test_with::env)
Manual smoke test with Anthropic/OpenAI/Gemini test_tool_result_with_image (requires API keys at compile time)

🤖 Generated with Claude Code

…ake sandbox more short-lived during command execution

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…tion The develop branch's version used a Weak-ref pattern and stripped out rich functionality from the original. Restore full SandboxConfig fields (name, persist, idle_timeout_secs, default_timeout_secs), op-level VM lifetime via ensure_running()/stop_and_wait(), blocking Drop cleanup, convenience methods (shell, write_file, read_file, copy_from/to_host, is_running, start, stop, shutdown), validate_sandbox_name() with SUN_PATH_MAX guard, remove_persisted() free function, and the complete original test suite. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

The name field is now preserved through serialize/deserialize when Some, so the old "always regenerated at runtime and is not serialised" comment was no longer accurate. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…file-read-tool Resolved conflicts by adopting develop's spec/provider-based AgentBuilder and ToolProvider/ToolProviderElem naming, while preserving the file_read builtin tool (text + image support) added on this branch. Key resolutions: - AgentBuilder: switched to spec/provider design from develop (#391) - AgentSpec: removed RunenvSpec enum and sandbox() method - AgentProvider: uses ToolProvider (singular) instead of Vec<ToolProvider> + sandboxes - BuiltinToolProviderElem: renamed from BuiltinToolProvider; FileRead variant retained - sandbox.rs: kept local truncate_output fn; read/write RunEnv methods already in develop - tool/builder.rs: removed (superseded by spec/provider construction path) - tool/toolset.rs: deleted (removed in develop #391) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

After merge with develop, LangModelProvider::API enum variant was renamed to LangModelProviderElem::API. Update the three API test files accordingly. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Merge resolution incorrectly kept a local truncate_output copy in sandbox.rs. This change aligns with the PR #393 intent: all truncation goes through the shared util::truncate::middle_truncate. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…-read-tool Resolved conflicts: - util/truncate.rs: add docstring and tests from develop - anthropic.rs, openai.rs: tool result content uses text/image parts array, unsupported parts are skipped; test imports include both TokenUsage and ToolDescBuilder Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…utput Tool result content is now an array of text/image blocks instead of a plain string; Part::Value is filtered out (unsupported). Updated assertions and doc comments in both anthropic.rs and openai.rs accordingly. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Covers embedded (base64) and URL image parts in tool result content for both Anthropic and OpenAI marshal implementations. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…shaling Covers embedded image-only and text+image mixed tool results: - image-only: functionResponse gets {mimeType, type:"image"} placeholder, bytes appear as sibling inline_data part - mixed: text goes into functionResponse result, image becomes sibling inline_data Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…c and OpenAI Value::String → plain text block, other Value types → JSON-encoded text block. Updated tests to assert the text block output instead of filtering. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

jhlee525

I'll merge file_read tool & read tool in #395

khj809 and others added 11 commits April 24, 2026 21:51

add agent builder(for sharing sandbox between parent and subagent), m…

74b0acc

…ake sandbox more short-lived during command execution

adjust publicity

ba999b3

adjust publicity

35e63e0

Merge develop (PR #390 sandbox refactor) into feat/agent-builder

12a3938

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

cargo fmt

9988472

remove idle_timeout_secs, fix ensure_running

740882a

remove SandboxSource and receive RunEnv directly

922d25b

bugfix: sandbox not delivered

ff71bd1

add builtin file_read tool for both text and image

fffbca4

Base automatically changed from feat/agent-builder to develop May 4, 2026 12:36

khj809 and others added 4 commits May 6, 2026 22:10

fix: update test code to use LangModelProviderElem::API

6a752fa

After merge with develop, LangModelProvider::API enum variant was renamed to LangModelProviderElem::API. Update the three API test files accordingly. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

resolve

a750060

khj809 self-assigned this May 6, 2026

khj809 marked this pull request as ready for review May 6, 2026 13:18

khj809 requested review from grf53, jhlee525 and nuri-yoo May 6, 2026 13:18

nuri-yoo reviewed May 7, 2026

View reviewed changes

Comment thread src/tool_impl/builtins/file_read.rs

Comment thread src/lang_model_impl/api/gemini.rs Outdated

Comment thread src/lang_model_impl/api/openai.rs

Comment thread src/tool_impl/builtins/file_read.rs

apply feedbacks

30418e2

khj809 requested a review from nuri-yoo May 7, 2026 11:52

nuri-yoo approved these changes May 7, 2026

View reviewed changes

khj809 and others added 5 commits May 8, 2026 13:54

test: add image block assertions to test_tool_result_content_marshaling

66258ef

Covers embedded (base64) and URL image parts in tool result content for both Anthropic and OpenAI marshal implementations. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

tidy

5c980a6

jhlee525 approved these changes May 8, 2026

View reviewed changes

khj809 merged commit 85de52a into develop May 8, 2026

khj809 deleted the feat/builtin-file-read-tool branch May 8, 2026 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add builtin `file_read` tool for multimodal files#393

feat: add builtin `file_read` tool for multimodal files#393
khj809 merged 22 commits into
developfrom
feat/builtin-file-read-tool

khj809 commented Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhlee525 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

khj809 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New: BuiltinToolProvider::FileRead / file_read tool (src/tool_impl/builtins/file_read.rs)

Provider marshal fixes for Part::Image in Role::Tool messages

Gemini thinking model support (gemini.rs)

Shared utility (src/util/truncate.rs)

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhlee525 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

khj809 commented Apr 29, 2026 •

edited

Loading

New: `BuiltinToolProvider::FileRead` / `file_read` tool (`src/tool_impl/builtins/file_read.rs`)

Provider marshal fixes for `Part::Image` in `Role::Tool` messages

Gemini thinking model support (`gemini.rs`)

Shared utility (`src/util/truncate.rs`)