-
Notifications
You must be signed in to change notification settings - Fork 6k
[MCP] Render MCP tool call result images to the model #5600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
codex/codex-rs/core/src/chat_completions.rs
Lines 238 to 243 in 2c53e50
| ResponseItem::FunctionCallOutput { call_id, output } => { | |
| messages.push(json!({ | |
| "role": "tool", | |
| "tool_call_id": call_id, | |
| "content": output.content, | |
| })); |
The new FunctionCallOutputPayload now carries content_items so MCP tool results with images can be serialized as the array format expected by the Responses API. However, the chat‑completions code path still injects output.content directly into the message body ({"role":"tool","content": output.content}), which is always a plain string. When a tool returns an image, output.content_items is populated but ignored here, so the request sent through the chat completions API remains a JSON string containing base64 data rather than the structured array that allows the model to render the image. Any run that uses chat completions instead of the responses API will continue to be blind to MCP images despite the new conversion logic.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
@codex fix this |
|
Summary
Testing
|
|
@codex review this again |
|
Codex Review: Didn't find any major issues. What shall we delve into next? ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
| structured_content, | ||
| } = call_tool_result; | ||
|
|
||
| let is_success = is_error != &Some(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the & since Option<bool> implements PartialEq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't compare `&std::option::Option<bool>` with `std::option::Option<bool>`
the trait `std::cmp::PartialEq<std::option::Option<bool>>` is not implemented for `&std::option::Option<bool>`
|
@codex fix all of the comments |
|
This PR can use a test or two |
|
Summary
Testing
|
beb8f14 to
604b797
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left just a comment for best practice so not mandatory. Otherwise, LGTM
codex-rs/protocol/src/models.rs
Outdated
| } | ||
|
|
||
| impl FunctionCallOutputPayload { | ||
| pub fn from_call_tool_result(call_tool_result: &CallToolResult) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be an impl From
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
It's pretty amazing we have gotten here without the ability for the model to see image content from MCP tool calls.
This PR builds off of #4391 and fixes #4819. I would like @KKcorps to get adequete credit here but I also want to get this fix in ASAP so I gave him a week to update it and haven't gotten a response so I'm going to take it across the finish line.
This test highlights how absured the current situation is. I asked the model to read this image using the Chrome MCP

After this change, it correctly outputs:
Before this change, it said:
Without this change, the Figma, Playwright, Chrome, and other visual MCP servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third party completions api