[ML] Refactoring streaming error handling #131316

jonathan-buttner · 2025-07-15T18:56:45Z

This PR is an iteration of this one: #128923

Currently when we add new integrations that support chat completion streaming we have to duplicate some of the error handling code. This PR aims to reduce some of the duplication and move the logic to a new class ChatCompletionErrorResponseHandler. This solution uses composition. Ideally we'd push the common logic into the BaseResponseHandler but I feel like it is confusing since that class was originally only handling non-streaming logic. This solution separates the chat completion error handling logic from the other implementations (like completion and other task types).

I'm open to other ideas for simplifying things.

This PR only refactors the google vertex ai chat completion implementation to give an example of what we'd need to do for the other providers.

I did some manual testing with these requests to generate some errors:

PUT _inference/chat_completion/test-chat
{
    "service": "googlevertexai",
    "service_settings": {
        "service_account_json": "<service account>",
        "model_id": "gemini-2.0-flash-lite-001",
        "location": "us-central1",
        "project_id": "elastic-ml"
    }
}


POST _inference/chat_completion/test-chat/_stream
{
    "model": "mistralai/Mistral-7B-Instruct-v0.3",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant that can call functions to get live data."
        },
        {
            "role": "user",
            "content": "What is the weather like in Paris today?"
        }
    ]
}

That should generate an error like:

event: error
data: {"error":{"code":"400","message":"Received an unsuccessful status code for request from inference entity id [test-chat] status [400]. Error message: [Invalid Endpoint name: projects/elastic-ml/locations/global/publishers/google/models/mistralai%2FMistral-7B-Instruct-v0.3.]","type":"INVALID_ARGUMENT"}}

jonathan-buttner · 2025-07-15T18:57:43Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

            toRestStatus(responseStatusCode)
        );
    }

-    protected String errorMessage(String message, Request request, HttpResult result, ErrorResponse errorResponse, int statusCode) {
+    public static String constructErrorMessage(String message, Request request, ErrorResponse errorResponse, int statusCode) {


Moving this to public static so that the ChatCompletionErrorResponseHandler can access it.

jonathan-buttner · 2025-07-15T19:04:19Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

@@ -95,7 +95,7 @@ public void validateResponse(

    protected abstract void checkForFailureStatusCode(Request request, HttpResult result);

-    private void checkForErrorObject(Request request, HttpResult result) {
+    protected void checkForErrorObject(Request request, HttpResult result) {


We need to be able to override this if we want to use a different error class. This PR adds UnifiedChatCompletionErrorResponse which pulls the common fields for constructing a UnifiedChatCompletionException into a single place.

jonathan-buttner · 2025-07-15T19:05:29Z

...rg/elasticsearch/xpack/inference/external/http/retry/ChatCompletionErrorResponseHandler.java

+        this.unifiedChatCompletionErrorParser = Objects.requireNonNull(errorParser);
+    }
+
+    public void checkForErrorObject(Request request, HttpResult result) {


I copied this from BaseResponseHandler and added different parsing logic. I'm open to other ideas of how to solve this though.

jonathan-buttner · 2025-07-15T19:06:30Z

...rg/elasticsearch/xpack/inference/external/http/retry/ChatCompletionErrorResponseHandler.java

+        var errorMessage = BaseResponseHandler.constructErrorMessage(message, request, errorResponse, statusCode);
+        var restStatus = toRestStatus(statusCode);
+
+        if (errorResponse.errorStructureFound()) {


By using UnifiedChatCompletionErrorResponse we no longer need the instanceof checks.

jonathan-buttner · 2025-07-15T19:18:02Z

...rg/elasticsearch/xpack/inference/external/http/retry/ChatCompletionErrorResponseHandler.java

@@ -0,0 +1,171 @@
+/*


Once we transition all the chat completion response handlers to use this class we can remove the methods that are similar within BaseResponseHandler.

jonathan-buttner · 2025-07-15T19:23:52Z

...rg/elasticsearch/xpack/inference/external/http/retry/ChatCompletionErrorResponseHandler.java

+     * @param errorResponse      the parsed error response from the HTTP result
+     * @return an instance of {@link UnifiedChatCompletionException} with details from the error response
+     */
+    private UnifiedChatCompletionException buildChatCompletionErrorInternal(


The functionality here should be very similar to buildError method of GoogleVertexAiUnifiedChatCompletionResponseHandler.java except that it doesn't need to do the instanceof checks.

jonathan-buttner · 2025-07-15T19:25:08Z

...rence/src/main/java/org/elasticsearch/xpack/inference/external/http/retry/ErrorResponse.java

@@ -22,7 +22,7 @@ public ErrorResponse(String errorMessage) {
        this.errorStructureFound = true;
    }

-    private ErrorResponse(boolean errorStructureFound) {
+    protected ErrorResponse(boolean errorStructureFound) {


Changing to protected because UnifiedChatCompletionErrorResponse needs to call it.

jonathan-buttner · 2025-07-15T19:26:19Z

...rg/elasticsearch/xpack/inference/external/http/retry/UnifiedChatCompletionErrorResponse.java

+import java.util.Objects;
+
+public class UnifiedChatCompletionErrorResponse extends ErrorResponse {
+    public static final UnifiedChatCompletionErrorResponse UNDEFINED_ERROR = new UnifiedChatCompletionErrorResponse();


If you can think of a better to do this let me know. The issue here is that in situations where we fail to parse the error response we return this generic error response. Typically we return ErrorResponse.UNDEFINED_ERROR but we need it to be a UnifiedChatCompletionErrorResponse for the typing to be correct.

jonathan-buttner · 2025-07-15T19:30:20Z

...ck/inference/services/googlevertexai/GoogleVertexAiUnifiedChatCompletionResponseHandler.java

            try (
                XContentParser parser = XContentFactory.xContent(XContentType.JSON)
                    .createParser(XContentParserConfiguration.EMPTY, response.body())
            ) {
-                return ERROR_PARSER.apply(parser, null).orElse(ErrorResponse.UNDEFINED_ERROR);
+                return ERROR_PARSER.apply(parser, null).orElse(UnifiedChatCompletionErrorResponse.UNDEFINED_ERROR);


This is where the UnifiedChatCompletionErrorResponse.UNDEFINED_ERROR is used. We can't use ErrorResponse.UNDEFINED_ERROR here because it doesn't satisfy the UnifiedChatCompletionErrorResponse return type (which is needed for the UnifiedChatCompletionErrorParser interface).

elasticsearchmachine · 2025-07-16T14:29:30Z

Pinging @elastic/ml-core (Team:ML)

Refactoring google gemini streaming error handling

c241e4d

jonathan-buttner added >refactoring :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Jul 15, 2025

jonathan-buttner commented Jul 15, 2025

View reviewed changes

cjnolet added this to Elasticsearch + cuVS Team Jul 15, 2025

cjnolet removed this from Elasticsearch + cuVS Team Jul 15, 2025

Updating comments

7b790ee

jonathan-buttner marked this pull request as ready for review July 16, 2025 14:29

prwhelan approved these changes Jul 16, 2025

View reviewed changes

jonathan-buttner merged commit 3b1523a into elastic:main Jul 17, 2025
33 checks passed

jonathan-buttner deleted the ml-refactor-streaming-v2 branch July 17, 2025 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Refactoring streaming error handling #131316

[ML] Refactoring streaming error handling #131316

Uh oh!

jonathan-buttner commented Jul 15, 2025 •

edited

Loading

Uh oh!

jonathan-buttner Jul 15, 2025

Uh oh!

jonathan-buttner Jul 15, 2025

Uh oh!

jonathan-buttner Jul 15, 2025

Uh oh!

jonathan-buttner Jul 15, 2025

Uh oh!

jonathan-buttner Jul 15, 2025

Uh oh!

jonathan-buttner Jul 15, 2025

Uh oh!

jonathan-buttner Jul 15, 2025

Uh oh!

jonathan-buttner Jul 15, 2025

Uh oh!

jonathan-buttner Jul 15, 2025

Uh oh!

elasticsearchmachine commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

[ML] Refactoring streaming error handling #131316

[ML] Refactoring streaming error handling #131316

Uh oh!

Conversation

jonathan-buttner commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

jonathan-buttner commented Jul 15, 2025 •

edited

Loading