Summary
In v1.3.8, the OpenAI backend's chat_with_tools and chat_stream_with_tools methods were rewritten to always use the Responses API (/responses) instead of delegating to OpenAICompatibleProvider which uses the configurable CHAT_ENDPOINT (defaulting to chat/completions). This breaks any OpenAI-compatible server that only implements /v1/chat/completions (llama.cpp, LM Studio, etc.).
Regression
v1.3.7 — src/backends/openai.rs:520-521 correctly delegated to the compatible provider:
async fn chat_stream_with_tools(...) {
// Delegate to the inner OpenAICompatibleProvider which has the full implementation
self.provider.chat_stream_with_tools(messages, tools).await
}
v1.3.8 — src/backends/openai.rs:392-412 now builds a Responses API payload:
async fn chat_stream_with_tools(...) {
let params = ResponsesRequestParams {
config: &self.provider.config,
messages,
tools,
stream: true,
};
let body = build_responses_request(params)?;
let response = self
.send_responses_request(&body, "OpenAI responses stream")
.await?;
let response = self
.ensure_success_response(response, "OpenAI responses API")
.await?;
Ok(create_responses_stream_chunks(response))
}
Root Cause
The responses_url() method at src/backends/openai.rs:578-584 hardcodes the endpoint:
fn responses_url(&self) -> Result<reqwest::Url, LLMError> {
self.provider
.config
.base_url
.join("responses") // <-- HARDCODED, bypasses CHAT_ENDPOINT
.map_err(|e| LLMError::HttpError(e.to_string()))
}
This completely bypasses the OpenAIProviderConfig trait's CHAT_ENDPOINT constant (which defaults to "chat/completions" at src/providers/openai_compatible.rs:95).
Impact
When a local llama.cpp server (or any OpenAI-compatible server) is used with tool calling:
- The
OpenAI backend sends a Responses API payload to /v1/responses
- llama.cpp only implements
/v1/chat/completions, not /v1/responses
- Even if the server responds, the Responses stream parser (
src/backends/openai/responses/stream/events.rs:103) expects OpenAI's Responses API SSE format (with item.id, item.call_id, etc.), not the Chat Completions format
- Result:
ResponseFormatError: Missing id in responses event
Observed error:
Response format error: Missing id in responses event. Raw response: {"arguments":"","call_id":"fc_e4qppGeHFazV6EpUOz2tmXT92N7Yqmwu","name":"Read","status":"in_progress","type":"function_call"}
The raw response is actually valid Responses API-style data from llama.cpp, but it's being sent to the /responses endpoint which the server doesn't fully support.
Suggested Fix
Either:
- Add a config option to choose between Responses API and Chat Completions (e.g.,
use_responses_api: bool on the OpenAI struct, defaulting to true for openai.com but false for custom base URLs)
- Auto-detect: If a custom
base_url is provided (not api.openai.com), fall back to OpenAICompatibleProvider's chat/completions implementation
- Re-delegate for custom URLs: When
base_url differs from the default, delegate to OpenAICompatibleProvider::chat_stream_with_tools which uses CHAT_ENDPOINT
Workaround
Pin to llm = "=1.3.7" in Cargo.toml.
Note: Evidence gathered by an LLM agent (opencode/GLM5.1) during investigation.
Summary
In v1.3.8, the
OpenAIbackend'schat_with_toolsandchat_stream_with_toolsmethods were rewritten to always use the Responses API (/responses) instead of delegating toOpenAICompatibleProviderwhich uses the configurableCHAT_ENDPOINT(defaulting tochat/completions). This breaks any OpenAI-compatible server that only implements/v1/chat/completions(llama.cpp, LM Studio, etc.).Regression
v1.3.7 —
src/backends/openai.rs:520-521correctly delegated to the compatible provider:v1.3.8 —
src/backends/openai.rs:392-412now builds a Responses API payload:Root Cause
The
responses_url()method atsrc/backends/openai.rs:578-584hardcodes the endpoint:This completely bypasses the
OpenAIProviderConfigtrait'sCHAT_ENDPOINTconstant (which defaults to"chat/completions"atsrc/providers/openai_compatible.rs:95).Impact
When a local llama.cpp server (or any OpenAI-compatible server) is used with tool calling:
OpenAIbackend sends a Responses API payload to/v1/responses/v1/chat/completions, not/v1/responsessrc/backends/openai/responses/stream/events.rs:103) expects OpenAI's Responses API SSE format (withitem.id,item.call_id, etc.), not the Chat Completions formatResponseFormatError: Missing id in responses eventObserved error:
The raw response is actually valid Responses API-style data from llama.cpp, but it's being sent to the
/responsesendpoint which the server doesn't fully support.Suggested Fix
Either:
use_responses_api: boolon theOpenAIstruct, defaulting totruefor openai.com butfalsefor custom base URLs)base_urlis provided (not api.openai.com), fall back toOpenAICompatibleProvider'schat/completionsimplementationbase_urldiffers from the default, delegate toOpenAICompatibleProvider::chat_stream_with_toolswhich usesCHAT_ENDPOINTWorkaround
Pin to
llm = "=1.3.7"inCargo.toml.Note: Evidence gathered by an LLM agent (opencode/GLM5.1) during investigation.