Skip to content

Refactor response handlers to improve error handling and streamline mid-stream error processing #128923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Jan-Kazlouski-elastic
Copy link
Contributor

This refactoring is initiated by @jonathan-buttner 's comment and moves provider-independent, repeated logic to BaseResponseHandler, allowing all handlers dealing with streaming errors to reuse it. The goal is to reduce duplication and standardize error handling across providers.

Design Trade-offs and Considerations
1. Class Hierarchy Limitations
Currently, all streaming handlers inherit from non-streaming handlers. As a result, shared methods related to streaming error handling become accessible in contexts where they are not applicable (i.e., non-streaming handlers).
This is a compromise to avoid duplicating logic, but it exposes potentially misleading APIs in non-streaming classes. A more robust solution would involve a cleaner separation in the class hierarchy, e.g., extracting a common intermediate base class or using composition.

2. Enforcing Overrides with UnsupportedOperationException
For provider-specific methods that must be overridden, I've added UnsupportedOperationException as a safeguard.
This ensures that if a required override is missing and the method is called, it fails fast. However, this is a runtime check. Ideally, we'd enforce such overrides at compile time, possibly using abstract methods or interfaces. I’m open to suggestions here if we want stricter enforcement. In current implementation it is handled in some places using assert request.isStreaming().

3. Error Handling Scenarios
There are two main error-handling scenarios:

  • buildChatCompletionError: Handles errors returned by the provider in the regular HTTP response. This occurs for non-streaming results from provider in elastic streaming operations.
  • buildMidStreamChatCompletionError: Handles errors returned mid-stream, which must be extracted from the stream content.

4. Provider-Specific Handler Overview

  • ElasticInferenceServiceUnifiedChatCompletionResponseHandler: Contains custom logic; overrides most methods.
  • GoogleVertexAiUnifiedChatCompletionResponseHandler: Standard logic; only provider-specific methods overridden.
  • OpenAiUnifiedChatCompletionResponseHandler: Standard logic; only provider-specific methods overridden.
  • HuggingFaceChatCompletionResponseHandler: Standard logic; extends OpenAI handler and overrides only specific methods.
  • MistralUnifiedChatCompletionResponseHandler: Overrides only non-mid-stream error method; inherits others from OpenAI handler.

5. Use of instanceof and Casting
Currently, ErrorResponse types are checked via instanceof, and casting is done manually. While an alternative could be using reflection or a more abstracted pattern, I’m deliberately avoiding reflection due to it being risky impact.
A more structured approach (e.g., visitor pattern or dedicated handler interfaces per provider) would require a broader refactor, which may be worth considering in the future.

6. Regarding the Mistral PR Comment
As mentioned in the Mistral PR discussion, lifting instanceof checks higher in the hierarchy would impact the flow for other providers. This refactoring keeps the current behavior intact while still avoiding duplication by moving shared logic upward in the hierarchy.

Final Thoughts
This refactoring improves maintainability by reducing code duplication and providing a flexible way to build UnifiedChatCompletionException based on provider-specific logic. However, it introduces some technical debt due to the existing class hierarchy and runtime enforcement of required overrides.

I’d appreciate any feedback on whether we should consider a more structural redesign (e.g., class hierarchy split or polymorphic error handling). Didn't want to go too far into changing architecture because it wasn't requested in initial comment.

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • If submitting code, have you built your formula locally prior to submission with gradle check?
  • If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
  • If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • If you are submitting this code for a class then read our policy for that.

@elasticsearchmachine elasticsearchmachine added v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jun 4, 2025

protected final String requestType;
protected final ResponseParser parseFunction;
private final Function<HttpResult, ErrorResponse> errorParseFunction;
private final boolean canHandleStreamingResponses;

public BaseResponseHandler(String requestType, ResponseParser parseFunction, Function<HttpResult, ErrorResponse> errorParseFunction) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor is not used anywhere but children classes, making it protected for that reason.

this(requestType, parseFunction, errorParseFunction, false);
}

public BaseResponseHandler(
protected BaseResponseHandler(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor is not used anywhere but children classes, making it protected for that reason.

var errorEntityMsg = errorParseFunction.apply(result);
return buildError(message, request, result, errorEntityMsg);
var errorResponse = errorParseFunction.apply(result);
return buildError(message, request, result, errorResponse);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errorParseFunction creates ErrorResponse instance, not a message. errorEntityMsg variable naming doesn't really make sense here. Fixed.

}

protected Exception buildError(String message, Request request, HttpResult result, ErrorResponse errorResponse) {
var responseStatusCode = result.response().getStatusLine().getStatusCode();
return new ElasticsearchStatusException(
errorMessage(message, request, result, errorResponse, responseStatusCode),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unused parameter

Exception e,
Class<? extends ErrorResponse> errorResponseClass
) {
// Extract the error response from the message using the provided method
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if commenting here is excessive.

* @return a string representing the error type
*/
protected static String createErrorType(ErrorResponse errorResponse) {
return errorResponse != null ? errorResponse.getClass().getSimpleName() : "unknown";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this from several handlers so it can be used directly.

@jonathan-buttner jonathan-buttner added :ml Machine learning Team:ML Meta label for the ML team >refactoring v8.19.0 auto-backport Automatically create backport pull requests when merged labels Jun 6, 2025
@Jan-Kazlouski-elastic Jan-Kazlouski-elastic marked this pull request as ready for review June 9, 2025 09:45
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on this one! I left some suggestions. Let me know what you think. If we can implement these changes I have some additional ones for the buildError class but I think these changes are more straightforward first.

* @param e the exception that caused the error, can be null
* @return a {@link UnifiedChatCompletionException} representing the mid-stream error
*/
public UnifiedChatCompletionException buildMidStreamChatCompletionError(String inferenceEntityId, String message, Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this here? It seems like we always override this and call the buildMidStreamChatCompletionError implementation below that takes an exception class.

Can we remove this from the base class and have the methods be private/public within the implementing classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with your point. Did the needed change.

String inferenceEntityId,
String message,
Exception e,
Class<? extends ErrorResponse> errorResponseClass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing in a class here how about we either:

  • Take a function that accepts a string and returns some error class
  • Take a builder-like interface that has a method that accepts a string and returns an error class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change that would take supplier that returns Class. Approach of passing a function that would take String as parameter doesn't make sense to me because we're capable of determening the exact subclass of ErrorResponse based on ResponseHandler subclass. No string is needed for that purpose.
If you still think it is needed - could you please elaborate on how it will be used. And let me know if solution provided works here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear, I created a pull request against your repo to give an example: Jan-Kazlouski-elastic#1

I didn't make the changes across all the handlers. Let me know if you think the example will work, if so let's change the rest of the handlers.

The example I gave was for buildChatCompletionError(). I think we can also do something similar to the buildDefaultMidStreamChatCompletionError. Once you have the example I gave integrated, see if it more naturally helps with changing buildDefaultMidStreamChatCompletionError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you've used StreamingErrorResponse as the standard for non-mid-stream errors, which is specific to OpenAI and Google's Vertex. But it doesn't align with how other providers handle errors. For some - same error is used for both midstream and not midstream, for some - not, for some - midstream is handled by ancestor class. Even Introducing abstract class would make things more confusing.
So i decided to go with interfaces. One for mid-stream and the other for non-mid-stream because they are separate for some providers.

I really liked your earlier suggestion of moving exception creation into the ErrorResponse itself. This keeps things cleaner because there is no need to mess around with classes. Each error response object simply knows how to generate its own exception. And it is determined by the specific interface. If it can't, or if parsing fails, we fall back to a default. It's much simpler.

(Bonus) Maybe now is a good time to revisit the idea of unifying error structures across all providers, as previously suggested? To consider only returned JSON and don't try to parse it into structures and maintain them reflecting potential changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do let me know if current approach works. I really like how it simplifies things.
But I'm also looking forward to ability to unify the error responses as discussed earlier. That would make error processing very easy and straightforward.

Class<? extends ErrorResponse> errorResponseClass
) {
// Extract the error response from the message using the provided method
var errorResponse = extractMidStreamChatCompletionErrorResponse(message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class is getting kind of confusing with the various protected methods. I think there are a few places where we can move protected methods into parameters potentially which might make things clearer because we won't need a base class implementation that throws an error.

After the parameter change I mentioned above we can call the method/function we passed in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Now methods that were used to throw UnsupportedOperationException are being passed as parameters.

// Extract the error response from the message using the provided method
var errorResponse = extractMidStreamChatCompletionErrorResponse(message);
// Check if the error response matches the expected type
if (errorResponseClass.isInstance(errorResponse)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what this is trying to do is figure out if we were able to parse the json string or not. If we were able to parse json string then we'll return the child error class. For the places that I looked, if we can't parse the string we'll return ErrorResponse.UNDEFINED_ERROR.

So instead of checking for the object instance I think we can use errorResponse.errorStructureFound().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking if object is an instance of class with .isInstance() method is needed to check which subclass of ErrorResponse is actually there, not just to check whether or not structure was found and parsing was successful. errorStructureFound will return true for any successful ErrorResponse and it is important to cast to specific ErrorResponse subclass and not just to any ErrorResponse.
In order to be able to safely cast and use fields of ErrorResponse subclasses we need to check whether or not this ErrorResponse object is actually an instance of this subclass.

…or-handling-refactoring

# Conflicts:
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googlevertexai/GoogleVertexAiUnifiedChatCompletionResponseHandler.java
…rResponse for improved consistency and maintainability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged external-contributor Pull request authored by a developer outside the Elasticsearch team :ml Machine learning >refactoring Team:ML Meta label for the ML team v8.19.0 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants