feat: add builtin file_read tool for multimodal files#393
Merged
Conversation
…ake sandbox more short-lived during command execution
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…tion The develop branch's version used a Weak-ref pattern and stripped out rich functionality from the original. Restore full SandboxConfig fields (name, persist, idle_timeout_secs, default_timeout_secs), op-level VM lifetime via ensure_running()/stop_and_wait(), blocking Drop cleanup, convenience methods (shell, write_file, read_file, copy_from/to_host, is_running, start, stop, shutdown), validate_sandbox_name() with SUN_PATH_MAX guard, remove_persisted() free function, and the complete original test suite. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
The name field is now preserved through serialize/deserialize when Some, so the old "always regenerated at runtime and is not serialised" comment was no longer accurate. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…file-read-tool Resolved conflicts by adopting develop's spec/provider-based AgentBuilder and ToolProvider/ToolProviderElem naming, while preserving the file_read builtin tool (text + image support) added on this branch. Key resolutions: - AgentBuilder: switched to spec/provider design from develop (#391) - AgentSpec: removed RunenvSpec enum and sandbox() method - AgentProvider: uses ToolProvider (singular) instead of Vec<ToolProvider> + sandboxes - BuiltinToolProviderElem: renamed from BuiltinToolProvider; FileRead variant retained - sandbox.rs: kept local truncate_output fn; read/write RunEnv methods already in develop - tool/builder.rs: removed (superseded by spec/provider construction path) - tool/toolset.rs: deleted (removed in develop #391) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
After merge with develop, LangModelProvider::API enum variant was renamed to LangModelProviderElem::API. Update the three API test files accordingly. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Merge resolution incorrectly kept a local truncate_output copy in sandbox.rs. This change aligns with the PR #393 intent: all truncation goes through the shared util::truncate::middle_truncate. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
nuri-yoo
reviewed
May 7, 2026
nuri-yoo
approved these changes
May 7, 2026
…-read-tool Resolved conflicts: - util/truncate.rs: add docstring and tests from develop - anthropic.rs, openai.rs: tool result content uses text/image parts array, unsupported parts are skipped; test imports include both TokenUsage and ToolDescBuilder Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…utput Tool result content is now an array of text/image blocks instead of a plain string; Part::Value is filtered out (unsupported). Updated assertions and doc comments in both anthropic.rs and openai.rs accordingly. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Covers embedded (base64) and URL image parts in tool result content for both Anthropic and OpenAI marshal implementations. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…shaling
Covers embedded image-only and text+image mixed tool results:
- image-only: functionResponse gets {mimeType, type:"image"} placeholder,
bytes appear as sibling inline_data part
- mixed: text goes into functionResponse result, image becomes sibling inline_data
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…c and OpenAI Value::String → plain text block, other Value types → JSON-encoded text block. Updated tests to assert the text block output instead of filtering. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
file_readbuiltin tool that lets the LLM autonomously read files from the runenv(both local and sandbox) — both text and image files — within the agent loop. This enables multimodal file analysis (charts, screenshots, photos) without the caller needing to pre-load content into the message history.The implementation mostly follows Claude Code's FileReadTool with limited file types.
New:
BuiltinToolProvider::FileRead/file_readtool (src/tool_impl/builtins/file_read.rs)Part::Image(embedded base64); all other readable files returnPart::Textoffsetandlimitparameters for windowed text reads (character-based)RunEnv::read()so it works in both local and sandbox environmentsBuiltinToolProvider::FileRead {}insrc/tool/provider.rsProvider marshal fixes for
Part::ImageinRole::ToolmessagesPreviously all providers only used
contents[0]of a tool-result message and only as plain text/JSON. Now each provider correctly handles images in tool results:anthropic.rs)tool_result.contentis now the fullcontentsarray;Part::Valueis wrapped as{"type":"text","text":"..."}content block (required by the Anthropic API)gemini.rs)inline_dataparts in the outerpartsarray alongsidefunctionResponse— theFunctionResponseproto has nopartsfield, so nestedfunctionResponse.partswas wrongopenai.rs)function_call_output.outputbecomes a content array withinput_imageblocks;image_urlis a plain string (not{"url":"..."}object)chat_completion.rs)[image: mime_type]/[image at url]text placeholders, since ChatCompletion backends don't reliably accept inline images in tool resultsGemini thinking model support (
gemini.rs)Gemini 3 thinking models return a
thoughtSignaturethat must be replayed verbatim in subsequent turns. Added:thoughtSignaturefromthoughttext parts and fromfunctionCallparts (sibling field) during unmarshalthoughtSignatureback inthoughttext parts and alongsidefunctionCallparts during marshalmarshal_messagesnow always passesinclude_thinking = trueso signatures are never dropped mid-conversationShared utility (
src/util/truncate.rs)Extracted
middle_truncateintosrc/util/truncate.rs, removing local copies frombash.rsandsandbox.rs.Test plan
cargo test --lib tool_impl::builtins::file_read— 8 unit tests covering text, image, offset/limit, UTF-8 error, unsupported extension, missing path, nonexistent pathcargo test --lib lang_model_impl::api::gemini::tests::test_tool_result_with_image— 2-turn Gemini 3 interaction: step 1 gets realfunctionCall+thoughtSignature, step 2 sends JPEG tool result and verifies text response (model correctly describes Jensen Huang photo)cargo test --lib lang_model_impl::api— all 13 provider unit tests pass;test_tool_result_with_imagefor Anthropic and OpenAI skipped without API keys (require compile-timetest_with::env)test_tool_result_with_image(requires API keys at compile time)🤖 Generated with Claude Code