Skip to content

[ML] Refactoring streaming error handling #131316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Jul 15, 2025

This PR is an iteration of this one: #128923

Currently when we add new integrations that support chat completion streaming we have to duplicate some of the error handling code. This PR aims to reduce some of the duplication and move the logic to a new class ChatCompletionErrorResponseHandler. This solution uses composition. Ideally we'd push the common logic into the BaseResponseHandler but I feel like it is confusing since that class was originally only handling non-streaming logic. This solution separates the chat completion error handling logic from the other implementations (like completion and other task types).

I'm open to other ideas for simplifying things.

This PR only refactors the google vertex ai chat completion implementation to give an example of what we'd need to do for the other providers.

I did some manual testing with these requests to generate some errors:

PUT _inference/chat_completion/test-chat
{
    "service": "googlevertexai",
    "service_settings": {
        "service_account_json": "<service account>",
        "model_id": "gemini-2.0-flash-lite-001",
        "location": "us-central1",
        "project_id": "elastic-ml"
    }
}


POST _inference/chat_completion/test-chat/_stream
{
    "model": "mistralai/Mistral-7B-Instruct-v0.3",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant that can call functions to get live data."
        },
        {
            "role": "user",
            "content": "What is the weather like in Paris today?"
        }
    ]
}

That should generate an error like:

event: error
data: {"error":{"code":"400","message":"Received an unsuccessful status code for request from inference entity id [test-chat] status [400]. Error message: [Invalid Endpoint name: projects/elastic-ml/locations/global/publishers/google/models/mistralai%2FMistral-7B-Instruct-v0.3.]","type":"INVALID_ARGUMENT"}}

@jonathan-buttner jonathan-buttner added >refactoring :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Jul 15, 2025
toRestStatus(responseStatusCode)
);
}

protected String errorMessage(String message, Request request, HttpResult result, ErrorResponse errorResponse, int statusCode) {
public static String constructErrorMessage(String message, Request request, ErrorResponse errorResponse, int statusCode) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to public static so that the ChatCompletionErrorResponseHandler can access it.

@@ -95,7 +95,7 @@ public void validateResponse(

protected abstract void checkForFailureStatusCode(Request request, HttpResult result);

private void checkForErrorObject(Request request, HttpResult result) {
protected void checkForErrorObject(Request request, HttpResult result) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be able to override this if we want to use a different error class. This PR adds UnifiedChatCompletionErrorResponse which pulls the common fields for constructing a UnifiedChatCompletionException into a single place.

this.unifiedChatCompletionErrorParser = Objects.requireNonNull(errorParser);
}

public void checkForErrorObject(Request request, HttpResult result) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from BaseResponseHandler and added different parsing logic. I'm open to other ideas of how to solve this though.

var errorMessage = BaseResponseHandler.constructErrorMessage(message, request, errorResponse, statusCode);
var restStatus = toRestStatus(statusCode);

if (errorResponse.errorStructureFound()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using UnifiedChatCompletionErrorResponse we no longer need the instanceof checks.

@@ -0,0 +1,171 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we transition all the chat completion response handlers to use this class we can remove the methods that are similar within BaseResponseHandler.

* @param errorResponse the parsed error response from the HTTP result
* @return an instance of {@link UnifiedChatCompletionException} with details from the error response
*/
private UnifiedChatCompletionException buildChatCompletionErrorInternal(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality here should be very similar to buildError method of GoogleVertexAiUnifiedChatCompletionResponseHandler.java except that it doesn't need to do the instanceof checks.

@@ -22,7 +22,7 @@ public ErrorResponse(String errorMessage) {
this.errorStructureFound = true;
}

private ErrorResponse(boolean errorStructureFound) {
protected ErrorResponse(boolean errorStructureFound) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to protected because UnifiedChatCompletionErrorResponse needs to call it.

import java.util.Objects;

public class UnifiedChatCompletionErrorResponse extends ErrorResponse {
public static final UnifiedChatCompletionErrorResponse UNDEFINED_ERROR = new UnifiedChatCompletionErrorResponse();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can think of a better to do this let me know. The issue here is that in situations where we fail to parse the error response we return this generic error response. Typically we return ErrorResponse.UNDEFINED_ERROR but we need it to be a UnifiedChatCompletionErrorResponse for the typing to be correct.

try (
XContentParser parser = XContentFactory.xContent(XContentType.JSON)
.createParser(XContentParserConfiguration.EMPTY, response.body())
) {
return ERROR_PARSER.apply(parser, null).orElse(ErrorResponse.UNDEFINED_ERROR);
return ERROR_PARSER.apply(parser, null).orElse(UnifiedChatCompletionErrorResponse.UNDEFINED_ERROR);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the UnifiedChatCompletionErrorResponse.UNDEFINED_ERROR is used. We can't use ErrorResponse.UNDEFINED_ERROR here because it doesn't satisfy the UnifiedChatCompletionErrorResponse return type (which is needed for the UnifiedChatCompletionErrorParser interface).

@jonathan-buttner jonathan-buttner marked this pull request as ready for review July 16, 2025 14:29
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@jonathan-buttner jonathan-buttner merged commit 3b1523a into elastic:main Jul 17, 2025
33 checks passed
@jonathan-buttner jonathan-buttner deleted the ml-refactor-streaming-v2 branch July 17, 2025 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >refactoring Team:ML Meta label for the ML team v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants