Refactor response handlers to improve error handling and streamline mid-stream error processing #128923

Jan-Kazlouski-elastic · 2025-06-04T18:04:14Z

This refactoring is initiated by @jonathan-buttner 's comment and moves provider-independent, repeated logic to BaseResponseHandler, allowing all handlers dealing with streaming errors to reuse it. The goal is to reduce duplication and standardize error handling across providers.

Design Trade-offs and Considerations
1. Class Hierarchy Limitations
Currently, all streaming handlers inherit from non-streaming handlers. As a result, shared methods related to streaming error handling become accessible in contexts where they are not applicable (i.e., non-streaming handlers).
This is a compromise to avoid duplicating logic, but it exposes potentially misleading APIs in non-streaming classes. A more robust solution would involve a cleaner separation in the class hierarchy, e.g., extracting a common intermediate base class or using composition.

2. Enforcing Overrides with UnsupportedOperationException
For provider-specific methods that must be overridden, I've added UnsupportedOperationException as a safeguard.
This ensures that if a required override is missing and the method is called, it fails fast. However, this is a runtime check. Ideally, we'd enforce such overrides at compile time, possibly using abstract methods or interfaces. I’m open to suggestions here if we want stricter enforcement. In current implementation it is handled in some places using assert request.isStreaming().

3. Error Handling Scenarios
There are two main error-handling scenarios:

buildChatCompletionError: Handles errors returned by the provider in the regular HTTP response. This occurs for non-streaming results from provider in elastic streaming operations.
buildMidStreamChatCompletionError: Handles errors returned mid-stream, which must be extracted from the stream content.

4. Provider-Specific Handler Overview

ElasticInferenceServiceUnifiedChatCompletionResponseHandler: Contains custom logic; overrides most methods.
GoogleVertexAiUnifiedChatCompletionResponseHandler: Standard logic; only provider-specific methods overridden.
OpenAiUnifiedChatCompletionResponseHandler: Standard logic; only provider-specific methods overridden.
HuggingFaceChatCompletionResponseHandler: Standard logic; extends OpenAI handler and overrides only specific methods.
MistralUnifiedChatCompletionResponseHandler: Overrides only non-mid-stream error method; inherits others from OpenAI handler.

5. Use of instanceof and Casting
Currently, ErrorResponse types are checked via instanceof, and casting is done manually. While an alternative could be using reflection or a more abstracted pattern, I’m deliberately avoiding reflection due to it being risky impact.
A more structured approach (e.g., visitor pattern or dedicated handler interfaces per provider) would require a broader refactor, which may be worth considering in the future.

6. Regarding the Mistral PR Comment
As mentioned in the Mistral PR discussion, lifting instanceof checks higher in the hierarchy would impact the flow for other providers. This refactoring keeps the current behavior intact while still avoiding duplication by moving shared logic upward in the hierarchy.

Final Thoughts
This refactoring improves maintainability by reducing code duplication and providing a flexible way to build UnifiedChatCompletionException based on provider-specific logic. However, it introduces some technical debt due to the existing class hierarchy and runtime enforcement of required overrides.

I’d appreciate any feedback on whether we should consider a more structural redesign (e.g., class hierarchy split or polymorphic error handling). Didn't want to go too far into changing architecture because it wasn't requested in initial comment.

Have you signed the contributor license agreement?
Have you followed the contributor guidelines?
If submitting code, have you built your formula locally prior to submission with gradle check?
If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
If submitting code, have you checked that your submission is for an OS and architecture that we support?
If you are submitting this code for a class then read our policy for that.

…id-stream error processing

Jan-Kazlouski-elastic · 2025-06-04T18:07:23Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java


    protected final String requestType;
    protected final ResponseParser parseFunction;
    private final Function<HttpResult, ErrorResponse> errorParseFunction;
    private final boolean canHandleStreamingResponses;

-    public BaseResponseHandler(String requestType, ResponseParser parseFunction, Function<HttpResult, ErrorResponse> errorParseFunction) {


This constructor is not used anywhere but children classes, making it protected for that reason.

Jan-Kazlouski-elastic · 2025-06-04T18:07:33Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

        this(requestType, parseFunction, errorParseFunction, false);
    }

-    public BaseResponseHandler(
+    protected BaseResponseHandler(


This constructor is not used anywhere but children classes, making it protected for that reason.

Jan-Kazlouski-elastic · 2025-06-04T18:08:29Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

-        var errorEntityMsg = errorParseFunction.apply(result);
-        return buildError(message, request, result, errorEntityMsg);
+        var errorResponse = errorParseFunction.apply(result);
+        return buildError(message, request, result, errorResponse);


errorParseFunction creates ErrorResponse instance, not a message. errorEntityMsg variable naming doesn't really make sense here. Fixed.

Jan-Kazlouski-elastic · 2025-06-04T18:08:48Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

    }

    protected Exception buildError(String message, Request request, HttpResult result, ErrorResponse errorResponse) {
        var responseStatusCode = result.response().getStatusLine().getStatusCode();
        return new ElasticsearchStatusException(
-            errorMessage(message, request, result, errorResponse, responseStatusCode),


Removed unused parameter

Jan-Kazlouski-elastic · 2025-06-04T18:10:14Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

+        Exception e,
+        Class<? extends ErrorResponse> errorResponseClass
+    ) {
+        // Extract the error response from the message using the provided method


Let me know if commenting here is excessive.

Jan-Kazlouski-elastic · 2025-06-04T18:11:26Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

+     * @return a string representing the error type
+     */
+    protected static String createErrorType(ErrorResponse errorResponse) {
+        return errorResponse != null ? errorResponse.getClass().getSimpleName() : "unknown";


Moved this from several handlers so it can be used directly.

elasticsearchmachine · 2025-06-09T09:45:40Z

Pinging @elastic/ml-core (Team:ML)

jonathan-buttner

Sorry for the delay on this one! I left some suggestions. Let me know what you think. If we can implement these changes I have some additional ones for the buildError class but I think these changes are more straightforward first.

jonathan-buttner · 2025-06-27T18:31:28Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

+     * @param e                the exception that caused the error, can be null
+     * @return a {@link UnifiedChatCompletionException} representing the mid-stream error
+     */
+    public UnifiedChatCompletionException buildMidStreamChatCompletionError(String inferenceEntityId, String message, Exception e) {


Do we need this here? It seems like we always override this and call the buildMidStreamChatCompletionError implementation below that takes an exception class.

Can we remove this from the base class and have the methods be private/public within the implementing classes?

Agree with your point. Did the needed change.

jonathan-buttner · 2025-06-27T19:33:47Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

+        String inferenceEntityId,
+        String message,
+        Exception e,
+        Class<? extends ErrorResponse> errorResponseClass


Instead of passing in a class here how about we either:

Take a function that accepts a string and returns some error class

Take a builder-like interface that has a method that accepts a string and returns an error class

I made the change that would take supplier that returns Class. Approach of passing a function that would take String as parameter doesn't make sense to me because we're capable of determening the exact subclass of ErrorResponse based on ResponseHandler subclass. No string is needed for that purpose.
If you still think it is needed - could you please elaborate on how it will be used. And let me know if solution provided works here.

Sorry I wasn't clear, I created a pull request against your repo to give an example: Jan-Kazlouski-elastic#1

I didn't make the changes across all the handlers. Let me know if you think the example will work, if so let's change the rest of the handlers.

The example I gave was for buildChatCompletionError(). I think we can also do something similar to the buildDefaultMidStreamChatCompletionError. Once you have the example I gave integrated, see if it more naturally helps with changing buildDefaultMidStreamChatCompletionError.

I noticed you've used StreamingErrorResponse as the standard for non-mid-stream errors, which is specific to OpenAI and Google's Vertex. But it doesn't align with how other providers handle errors. For some - same error is used for both midstream and not midstream, for some - not, for some - midstream is handled by ancestor class. Even Introducing abstract class would make things more confusing.
So i decided to go with interfaces. One for mid-stream and the other for non-mid-stream because they are separate for some providers.

I really liked your earlier suggestion of moving exception creation into the ErrorResponse itself. This keeps things cleaner because there is no need to mess around with classes. Each error response object simply knows how to generate its own exception. And it is determined by the specific interface. If it can't, or if parsing fails, we fall back to a default. It's much simpler.

(Bonus) Maybe now is a good time to revisit the idea of unifying error structures across all providers, as previously suggested? To consider only returned JSON and don't try to parse it into structures and maintain them reflecting potential changes

Please do let me know if current approach works. I really like how it simplifies things.
But I'm also looking forward to ability to unify the error responses as discussed earlier. That would make error processing very easy and straightforward.

jonathan-buttner · 2025-06-27T19:37:30Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

+        Class<? extends ErrorResponse> errorResponseClass
+    ) {
+        // Extract the error response from the message using the provided method
+        var errorResponse = extractMidStreamChatCompletionErrorResponse(message);


I think this class is getting kind of confusing with the various protected methods. I think there are a few places where we can move protected methods into parameters potentially which might make things clearer because we won't need a base class implementation that throws an error.

After the parameter change I mentioned above we can call the method/function we passed in.

Done. Now methods that were used to throw UnsupportedOperationException are being passed as parameters.

jonathan-buttner · 2025-06-27T19:39:46Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

+        // Extract the error response from the message using the provided method
+        var errorResponse = extractMidStreamChatCompletionErrorResponse(message);
+        // Check if the error response matches the expected type
+        if (errorResponseClass.isInstance(errorResponse)) {


I think what this is trying to do is figure out if we were able to parse the json string or not. If we were able to parse json string then we'll return the child error class. For the places that I looked, if we can't parse the string we'll return ErrorResponse.UNDEFINED_ERROR.

So instead of checking for the object instance I think we can use errorResponse.errorStructureFound().

Checking if object is an instance of class with .isInstance() method is needed to check which subclass of ErrorResponse is actually there, not just to check whether or not structure was found and parsing was successful. errorStructureFound will return true for any successful ErrorResponse and it is important to cast to specific ErrorResponse subclass and not just to any ErrorResponse.
In order to be able to safely cast and use fields of ErrorResponse subclasses we need to check whether or not this ErrorResponse object is actually an instance of this subclass.

...src/main/java/org/elasticsearch/xpack/inference/external/http/retry/BaseResponseHandler.java

…or-handling-refactoring # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googlevertexai/GoogleVertexAiUnifiedChatCompletionResponseHandler.java

…nal interfaces for improved flexibility

…ucture before type matching

…rResponse for improved consistency and maintainability

…or-handling-refactoring

Refactor response handlers to improve error handling and streamline m…

4371bbd

…id-stream error processing

elasticsearchmachine added v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jun 4, 2025

Jan-Kazlouski-elastic mentioned this pull request Jun 4, 2025

Add Mistral AI Chat Completion support to Inference Plugin #128538

Merged

6 tasks

Jan-Kazlouski-elastic commented Jun 4, 2025

View reviewed changes

Merge branch 'main' into feature/streaming-error-handling-refactoring

99d4216

jonathan-buttner added :ml Machine learning Team:ML Meta label for the ML team >refactoring v8.19.0 auto-backport Automatically create backport pull requests when merged labels Jun 6, 2025

Jan-Kazlouski-elastic marked this pull request as ready for review June 9, 2025 09:45

Merge branch 'main' into feature/streaming-error-handling-refactoring

9b696be

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

jonathan-buttner reviewed Jun 27, 2025

View reviewed changes

Jan-Kazlouski-elastic added 6 commits July 7, 2025 14:33

Merge remote-tracking branch 'origin/main' into feature/streaming-err…

34cd687

…or-handling-refactoring # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googlevertexai/GoogleVertexAiUnifiedChatCompletionResponseHandler.java

Refactor error handling in streaming chat completion responses

569e351

Refactor mid-stream error handling in response handlers

0a09f00

Refactor error handling in streaming response handlers to use functio…

f4582f3

…nal interfaces for improved flexibility

Refactor error handling in BaseResponseHandler to check for error str…

8ff0bb6

…ucture before type matching

Refactor error handling in response classes to use ChatCompletionErro…

a7320f4

…rResponse for improved consistency and maintainability

Jan-Kazlouski-elastic mentioned this pull request Jul 9, 2025

Refactoring example Jan-Kazlouski-elastic/elasticsearch#1

Open

Jan-Kazlouski-elastic added 2 commits July 10, 2025 01:21

Remove redundant class

8cc7d72

Merge remote-tracking branch 'origin/main' into feature/streaming-err…

9f4abba

…or-handling-refactoring

Jan-Kazlouski-elastic requested a review from jonathan-buttner July 10, 2025 12:36

jonathan-buttner mentioned this pull request Jul 15, 2025

[ML] Refactoring streaming error handling #131316

Draft

Refactor response handlers to improve error handling and streamline mid-stream error processing #128923

Are you sure you want to change the base?

Refactor response handlers to improve error handling and streamline mid-stream error processing #128923

Conversation

Jan-Kazlouski-elastic commented Jun 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Jun 9, 2025

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!