Skip to content

Conversation

@gpeal
Copy link
Collaborator

@gpeal gpeal commented Oct 24, 2025

It's pretty amazing we have gotten here without the ability for the model to see image content from MCP tool calls.

This PR builds off of #4391 and fixes #4819. I would like @KKcorps to get adequete credit here but I also want to get this fix in ASAP so I gave him a week to update it and haven't gotten a response so I'm going to take it across the finish line.

This test highlights how absured the current situation is. I asked the model to read this image using the Chrome MCP
image

After this change, it correctly outputs:

Captured the page: image dhows a dark terminal-style UI labeled OpenAI Codex (v0.0.0) with prompt model: gpt-5-codex medium and working directory /codex/codex-rs
(and more)

Before this change, it said:

Took the full-page screenshot you asked for. It shows a long, horizontally repeating pattern of stylized people in orange, light-blue, and mustard clothing, holding hands in alternating poses against a white background. No text or other graphics-just rows of flat illustration stretching off to the right.

Without this change, the Figma, Playwright, Chrome, and other visual MCP servers are pretty much entirely useless.

I tested this change with the openai respones api as well as a third party completions api

@gpeal gpeal requested review from bolinfest and jif-oai October 24, 2025 06:00
Copy link
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

ResponseItem::FunctionCallOutput { call_id, output } => {
messages.push(json!({
"role": "tool",
"tool_call_id": call_id,
"content": output.content,
}));

P1 Badge Send image outputs as arrays in chat completions payloads

The new FunctionCallOutputPayload now carries content_items so MCP tool results with images can be serialized as the array format expected by the Responses API. However, the chat‑completions code path still injects output.content directly into the message body ({"role":"tool","content": output.content}), which is always a plain string. When a tool returns an image, output.content_items is populated but ignored here, so the request sent through the chat completions API remains a JSON string containing base64 data rather than the structured array that allows the model to render the image. Any run that uses chat completions instead of the responses API will continue to be blind to MCP images despite the new conversion logic.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@gpeal
Copy link
Collaborator Author

gpeal commented Oct 24, 2025

💡 Codex Review

ResponseItem::FunctionCallOutput { call_id, output } => {
messages.push(json!({
"role": "tool",
"tool_call_id": call_id,
"content": output.content,
}));

P1 Badge Send image outputs as arrays in chat completions payloads
The new FunctionCallOutputPayload now carries content_items so MCP tool results with images can be serialized as the array format expected by the Responses API. However, the chat‑completions code path still injects output.content directly into the message body ({"role":"tool","content": output.content}), which is always a plain string. When a tool returns an image, output.content_items is populated but ignored here, so the request sent through the chat completions API remains a JSON string containing base64 data rather than the structured array that allows the model to render the image. Any run that uses chat completions instead of the responses API will continue to be blind to MCP images despite the new conversion logic.

ℹ️ About Codex in GitHub

@codex fix this

@chatgpt-codex-connector
Copy link
Contributor

Summary

  • Added the serde_json::Value import so the chat completions client can construct structured content payloads while retaining existing tracing context. codex-rs/core/src/chat_completions.rsL1-L34
  • Updated the ResponseItem::FunctionCallOutput handling to emit content_items arrays when present, logging serialization failures and falling back to the legacy string form so MCP image outputs reach the model even via chat completions. codex-rs/core/src/chat_completions.rsL239-L257

Testing

  • just fmt
  • just fix -p codex-core
  • cargo test -p codex-core (numerous integration suites such as suite::apply_patch_cli, suite::client, suite::json_result, suite::otel, etc. require the full CLI/runtime environment and failed or timed out in this sandbox)

View task →

@gpeal
Copy link
Collaborator Author

gpeal commented Oct 24, 2025

@codex review this again

@chatgpt-codex-connector
Copy link
Contributor

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

structured_content,
} = call_tool_result;

let is_success = is_error != &Some(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the & since Option<bool> implements PartialEq

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't compare `&std::option::Option<bool>` with `std::option::Option<bool>`
the trait `std::cmp::PartialEq<std::option::Option<bool>>` is not implemented for `&std::option::Option<bool>`

@gpeal
Copy link
Collaborator Author

gpeal commented Oct 24, 2025

@codex fix all of the comments

@pakrym-oai
Copy link
Collaborator

This PR can use a test or two

@chatgpt-codex-connector
Copy link
Contributor

Summary

Testing

  • just fmt
  • just fix -p codex-protocol
  • just fix -p codex-core
  • cargo test -p codex-protocol
  • ⚠️ cargo test -p codex-core (cancelled after a multi-minute build; the sandbox cannot finish the full integration suite)

View task →

Copy link
Collaborator

@jif-oai jif-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left just a comment for best practice so not mandatory. Otherwise, LGTM

}

impl FunctionCallOutputPayload {
pub fn from_call_tool_result(call_tool_result: &CallToolResult) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an impl From

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

@gpeal gpeal merged commit b0bdc04 into main Oct 27, 2025
20 checks passed
@gpeal gpeal deleted the gpeal/mcp-image branch October 27, 2025 21:55
@github-actions github-actions bot locked and limited conversation to collaborators Oct 27, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Image Responses from MCP tool calls

4 participants