Skip to content

feat: add builtin file_read tool for multimodal files#393

Merged
khj809 merged 22 commits into
developfrom
feat/builtin-file-read-tool
May 8, 2026
Merged

feat: add builtin file_read tool for multimodal files#393
khj809 merged 22 commits into
developfrom
feat/builtin-file-read-tool

Conversation

@khj809
Copy link
Copy Markdown
Contributor

@khj809 khj809 commented Apr 29, 2026

Summary

Adds a file_read builtin tool that lets the LLM autonomously read files from the runenv(both local and sandbox) — both text and image files — within the agent loop. This enables multimodal file analysis (charts, screenshots, photos) without the caller needing to pre-load content into the message history.
The implementation mostly follows Claude Code's FileReadTool with limited file types.

New: BuiltinToolProvider::FileRead / file_read tool (src/tool_impl/builtins/file_read.rs)

  • Detects file type by extension: PNG/JPEG/GIF/WEBP return Part::Image (embedded base64); all other readable files return Part::Text
  • Unsupported binary types (PDF, ZIP, EXE, WASM, …) return a structured error
  • Non-UTF-8 text files return a structured error
  • Supports offset and limit parameters for windowed text reads (character-based)
  • Reads via RunEnv::read() so it works in both local and sandbox environments
  • Registered as BuiltinToolProvider::FileRead {} in src/tool/provider.rs

Provider marshal fixes for Part::Image in Role::Tool messages

Previously all providers only used contents[0] of a tool-result message and only as plain text/JSON. Now each provider correctly handles images in tool results:

Provider Fix
Anthropic (anthropic.rs) tool_result.content is now the full contents array; Part::Value is wrapped as {"type":"text","text":"..."} content block (required by the Anthropic API)
Gemini (gemini.rs) Images go as sibling inline_data parts in the outer parts array alongside functionResponse — the FunctionResponse proto has no parts field, so nested functionResponse.parts was wrong
OpenAI Responses (openai.rs) function_call_output.output becomes a content array with input_image blocks; image_url is a plain string (not {"url":"..."} object)
ChatCompletion (chat_completion.rs) Images in tool results are substituted with [image: mime_type] / [image at url] text placeholders, since ChatCompletion backends don't reliably accept inline images in tool results

Gemini thinking model support (gemini.rs)

Gemini 3 thinking models return a thoughtSignature that must be replayed verbatim in subsequent turns. Added:

  • Capture thoughtSignature from thought text parts and from functionCall parts (sibling field) during unmarshal
  • Include thoughtSignature back in thought text parts and alongside functionCall parts during marshal
  • marshal_messages now always passes include_thinking = true so signatures are never dropped mid-conversation

Shared utility (src/util/truncate.rs)

Extracted middle_truncate into src/util/truncate.rs, removing local copies from bash.rs and sandbox.rs.

Test plan

  • cargo test --lib tool_impl::builtins::file_read — 8 unit tests covering text, image, offset/limit, UTF-8 error, unsupported extension, missing path, nonexistent path
  • cargo test --lib lang_model_impl::api::gemini::tests::test_tool_result_with_image — 2-turn Gemini 3 interaction: step 1 gets real functionCall + thoughtSignature, step 2 sends JPEG tool result and verifies text response (model correctly describes Jensen Huang photo)
  • cargo test --lib lang_model_impl::api — all 13 provider unit tests pass; test_tool_result_with_image for Anthropic and OpenAI skipped without API keys (require compile-time test_with::env)
  • Manual smoke test with Anthropic/OpenAI/Gemini test_tool_result_with_image (requires API keys at compile time)

🤖 Generated with Claude Code

khj809 and others added 11 commits April 24, 2026 21:51
…ake sandbox more short-lived during command execution
…tion

The develop branch's version used a Weak-ref pattern and stripped out rich
functionality from the original. Restore full SandboxConfig fields (name,
persist, idle_timeout_secs, default_timeout_secs), op-level VM lifetime via
ensure_running()/stop_and_wait(), blocking Drop cleanup, convenience methods
(shell, write_file, read_file, copy_from/to_host, is_running, start, stop,
shutdown), validate_sandbox_name() with SUN_PATH_MAX guard, remove_persisted()
free function, and the complete original test suite.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
The name field is now preserved through serialize/deserialize when Some,
so the old "always regenerated at runtime and is not serialised" comment
was no longer accurate.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Base automatically changed from feat/agent-builder to develop May 4, 2026 12:36
khj809 and others added 4 commits May 6, 2026 22:10
…file-read-tool

Resolved conflicts by adopting develop's spec/provider-based AgentBuilder and
ToolProvider/ToolProviderElem naming, while preserving the file_read builtin tool
(text + image support) added on this branch.

Key resolutions:
- AgentBuilder: switched to spec/provider design from develop (#391)
- AgentSpec: removed RunenvSpec enum and sandbox() method
- AgentProvider: uses ToolProvider (singular) instead of Vec<ToolProvider> + sandboxes
- BuiltinToolProviderElem: renamed from BuiltinToolProvider; FileRead variant retained
- sandbox.rs: kept local truncate_output fn; read/write RunEnv methods already in develop
- tool/builder.rs: removed (superseded by spec/provider construction path)
- tool/toolset.rs: deleted (removed in develop #391)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
After merge with develop, LangModelProvider::API enum variant was renamed
to LangModelProviderElem::API. Update the three API test files accordingly.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Merge resolution incorrectly kept a local truncate_output copy in sandbox.rs.
This change aligns with the PR #393 intent: all truncation goes through
the shared util::truncate::middle_truncate.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@khj809 khj809 self-assigned this May 6, 2026
@khj809 khj809 marked this pull request as ready for review May 6, 2026 13:18
@khj809 khj809 requested review from grf53, jhlee525 and nuri-yoo May 6, 2026 13:18
Comment thread src/tool_impl/builtins/file_read.rs
Comment thread src/lang_model_impl/api/gemini.rs Outdated
Comment thread src/lang_model_impl/api/openai.rs
Comment thread src/tool_impl/builtins/file_read.rs
@khj809 khj809 requested a review from nuri-yoo May 7, 2026 11:52
khj809 and others added 5 commits May 8, 2026 13:54
…-read-tool

Resolved conflicts:
- util/truncate.rs: add docstring and tests from develop
- anthropic.rs, openai.rs: tool result content uses text/image parts array,
  unsupported parts are skipped; test imports include both TokenUsage and ToolDescBuilder

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…utput

Tool result content is now an array of text/image blocks instead of a plain
string; Part::Value is filtered out (unsupported). Updated assertions and
doc comments in both anthropic.rs and openai.rs accordingly.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Covers embedded (base64) and URL image parts in tool result content for
both Anthropic and OpenAI marshal implementations.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…shaling

Covers embedded image-only and text+image mixed tool results:
- image-only: functionResponse gets {mimeType, type:"image"} placeholder,
  bytes appear as sibling inline_data part
- mixed: text goes into functionResponse result, image becomes sibling inline_data

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…c and OpenAI

Value::String → plain text block, other Value types → JSON-encoded text block.
Updated tests to assert the text block output instead of filtering.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Copy link
Copy Markdown
Contributor

@jhlee525 jhlee525 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge file_read tool & read tool in #395

@khj809 khj809 merged commit 85de52a into develop May 8, 2026
@khj809 khj809 deleted the feat/builtin-file-read-tool branch May 8, 2026 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants