Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for streaming on requests (and add better handling for streaming in responses) #2734

Open
trumpetinc opened this issue Jan 18, 2025 · 7 comments
Labels
breaking change Change that proposes a non-backward compatible breaking change proposal Proposed Specification or API change

Comments

@trumpetinc
Copy link

Related issue - my apologies if this is a double post - I wasn't sure how to make sure the discussion was being viewed by the appropriate people: #1243

Problem

Feign currently uses a byte[] when processing the request body. This does not work for transfers of large request bodies (for example, uploading large files).

Requested solution

Make request bodies be designed around streams.

Status/Findings

I just finished up creating a bunch of shims to make streaming requests work in Feign, and would like to see if ya'll would be open to having a discussion about how to make this part of the main library. This works, but it is ugly.

Doing this properly will require a breaking change, so I am hesitant to create a pull request without some good discussion about the best way to do it, where we can and can't break API, etc...

Possible Approach

Based on the workaround implementation I created, I believe that the general approach will be as follows:

First, the existing Response.Body class is perfectly suitable for handing both Request and Responses. It actually looks like Response.Body was designed with this in mind. I think that refactoring this into a standalone Body interface, with concrete implementations for ByteArrayBody, InputStreamBody, NoContentBody, and possibly a FileContentBody. I also suggest removing calls specific to encoding, etc... from this class. I would like to get rid of null checks on the body parameter and just use a no-op NoContentBody if possible.

Next, there will be some surgical changes to Client.Default (for requests, we need to write a stream instead of a byte[])

There are some questions about calling close() (or not calling close() ) at the end of the response.

Another big impact area is going to be in logger implementations (which need to either not do body logging on streamed bodies, or need to use a different approach that monitors the stream but does not actually read the stream as part of the logging call).

And I think there should be some discussion about supporting InputStream as a body parameter type, and possibly OutputStream as a client method response parameter type. Maybe these could be handled with specialized encoder/decoder implementations, but it's worth discussion about whether these should be part of the core library. Might also discuss a File type as a body parameter type (this would allow setting content-length header and make the body retryable, which would worthwhile.

Finally, all of my work has been on the synchronous implementations - so evaluation with respect to Async will be needed.

Conclusion

All in all, while the design discussion will be important, I am confident that it is possible to make this change without massive amounts of surgery. And I think this would be a huge enhancement to Feign. This is a huge gap with Jax-RS that would be good to close.

Is there interest in pursuing this? I'm happy to do the grunt work on the pull request, but I think this is a big enough breaking change that there should be discussion to get the design right before I start slinging code.

@kdavisk6
Copy link
Member

Thanks for taking the time to write this up. You are on the right track by understanding that our preference is to keep the core Feign objects simple and rely on specialized Encoder and Decoder implementations to deal with these sorts of scenarios. We've learned that trying to include this type of capabilities into core Feign result in best intentions gone wrong.

With that said, including OutputStream and InputStream specific implementations would be a welcome addition. Please feel free to submit a Pull Request with your ideas for our review.

@kdavisk6 kdavisk6 added breaking change Change that proposes a non-backward compatible breaking change proposal Proposed Specification or API change labels Jan 20, 2025
@trumpetinc
Copy link
Author

ok - I will put together a PR over the next week or so, then we can have what I'm sure will be a lively discussion about some of the tradeoffs.

@rickgongtds
Copy link

rickgongtds commented Jan 22, 2025

It would be great if Feign could provide an elegant way to support streaming large files.

Our organization has hundreds of applications using Feign clients to upload files to a central file service. While most files are relatively small (less than 20MB), there are occasional cases where files can reach sizes up to 2GB. Unfortunately, this causes our services to experience frequent OutOfMemory (OOM) errors due to Feign loading the entire file into memory.

To address this issue, I had to resort to a temporary and admittedly ugly hack. Specifically, I recompiled the FeignInvocationHandler class to handle large file uploads with a custom workaround. Here's an excerpt of the adjusted invoke() method:

@Override
public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
    // ...
    if (method.getDeclaringClass() == MyBigFileFeignClient.class &&
            method.getName().equals("uploadBigFile")) {
        return MyUglyTempWorkAround.streamingLargeFile(method, args);
    }

    return dispatch.get(method).invoke(args);
}

The class MyUglyTempWorkAround will use OkHttpClient directly to stream large file, full of hard codes.

I fully understand this is not ideal, but given the lack of native support for streaming large files in Feign, I had no other choice.

I hope the Feign team can consider adding native support for streaming large files in the future. This would not only address issues like ours but also make Feign more robust for handling various HTTP use cases.

Thank you for your consideration.

@trumpetinc
Copy link
Author

Initial pull request has been submitted ( #2750 ) - I am eager to get feedback and comments!

@trumpetinc
Copy link
Author

I have played around with this approach with different Encoder implementations, and I think that I need to do another design iteration. Requests should be writing to an outputstream, not providing an inputstream.

I'll do another iteration sometime this week.

@trumpetinc
Copy link
Author

@kdavisk6

I'm on my 3rd iteration - I have streaming working, but I'm not happy about some of the limitations inherent to my earlier designs. I think that I now understand the problem space well enough to propose a solid solution

I am at a point where I think some discussion is needed so I can understand design intent/etc...

Streaming Requests Design Overview

Many streaming Encoder implementations are going to be designed to write to an OutputStream. Case in point, the Multi-part MIME encoder will stream to an outputstream. The Jackson encoder would also be naturally implemented this way (current behavior is to write to a byte[], but it could just a easily write to an output stream). I also intend to support encoders writing to a Writer interface, which would encapsulate charset encoding for unicode output.

The approach that I am planning on pursuing for adding support for streaming requests is to introduce the concept of an OutputStreamSender for the RequestTemplate (and Request). This will have a single method ( send(OutputStream) ). For any unicode based bodies, we will have a convenience method that will take the unicode form of the request body and write it using the charset specified in the content-type header.

The net effect of this will be that instead of providing a byte[] body to the request, we will ultimately provide an OutputStreamSender. Encoders are ultimately going to provide an appropriate OutputStreamWriter, and for backwards compatibility/simplicity we will have convenience methods on RequestTemplate that accept byte[], String, etc... and construct a suitable OutputStreamSender.

I have 6 areas that I could use some clarification/guidance on:

1. Charset encoding in RequestTemplate (and to a lesser degree, in Request)

Right now, the only place that the charset encoding can be set different than the default (UTF_8) appears to be via the RequestTemplate#bodyTemplate() method. And that method is called by 2 unit tests, and nowhere else in the Feign project.

It seems to me that the correct place to assign charset encoding is via the header template (Content-Type, if charset is specified, otherwise utf-8, unless that was maybe a different implied encoding for older versions of HTTP). Everything else with the request really should be handled via unicode (if charset is important) or binary (if charset is not important).

So I would expect that we would see a bodyTemplate(String) method, but not a bodyTemplate(String, Charset) method. We would then capture the string, and encode it using the charset from the content-type header as the request is sent.

Basically, I am suggesting that the work to capture a byte[] for the request should be deferred to as late as possible, which means having RequestTemplate and Request be entirely agnostic of the charset encoding. Instead, the charset from the content-type header will be used to encode any unicode content as it is sent to the server.

My question: Is anyone going to object to a hard-deprecation on RequestTemlate#bodyTemplate(String, Charset) (i.e. where we basically ignore the Charset parameter in that method in preference to whatever charset is set in the Content-Type request header template) ? I don't really want to get into tweaking the content-type header based on this method argument...

2. Request Logging

Streaming request bodies are not always re-playable, so we need to be a bit careful with how logging of the body is done. There is a similar concern in the Request#toString() method.

There are two approaches that I can think of:

a) Specify a string representation of the request body in the RequestTemplate (when we set the body, we also set a string representation). For non-streaming encoders, the body content and string representation will be the same (though there is an argument to be made for setting the string representation of a binary request body to something like "-- Binary Content (xxx bytes) --" or "-- Binary Content (unknown length) --" instead of attempting to encode the byte stream as text).

b) Change how request body logging is done by logging the bytes as they are sent to the server. Basically, we would instrument the ouputstream that the encoders will ultimately be writing to. This would allow us to absolutely log the exact data sent to the server. The downside of this approach is that the actual logging of the request body wouldn't happen in the Logger#logRequest() method. This would force us to make Request be mutable, which I'm not very happy about. I also think that most times with a streaming request, logging the actual body content isn't desirable anyway.

My recommendation is to pursue approach (a). This will provide identical logging/toString behavior to the current implementation, and should be completely compatible with existing Encoder implementations.

My question: Does anyone see a problem with (a) as the approach to request logging?

3. FeignException body handling

This one is a bit awkward. FeignException currently has a field for the response body. But there is a static factory for creating FeignExceptions during request processing that looks like it mixes up the request and response bodies: FeignException#errorReading(Request, Response, IOException). This sets the FeignException response body to the request.body() value.

Similarly, in errorReading, the headers are set to the request headers instead of the response headers.

I'm honestly not sure what to make of this one.

My question: Can someone provide me with an explanation of why the request body is captured here instead of the response body?

**4. Response Logging##

Streaming responses may only be readable a single time (not resettable). There are a couple of options I see for handling logging.

a) Use a buffered input stream to support logging of up to the first X bytes of the stream. We then reset the stream for decoder body processing (basically, any call to Response.Body#asReader or #asInputStream will reset the stream as long as not too many bytes have been read).

b) We instrument the inputstream so it writes to the log output as the stream is processed. This has similar advantages and concerns as I described in the request logging section above.

My plan is to use approach (a). I have done this in an earlier iteration of my design, and it works well.

My question: Does anyone see a problem with (a) as the approach to response logging? Are we ok with not logging the entire response body if it exceeds some threshold?

##5. Charset in response bodies##

I believe that encoding for response bodies should always come from the response headers. Right now, there are two Response.Body#asReader() methods that ignore the headers. This puts a requirement onto the decoder to explicitly set the charset, which feels awkward and potentially error prone. If the content-type says the body encoding is XYZ, then we really need to be interpreting the body using that encoding, right?

This one isn't a show stopper for me, but it's odd enough that I felt it was worth asking about.

My question: Am I missing something here? Are there normal usage scenarios where it is desirable to interpret the response body using a different charset than specified in the headers?

##6. Use of nulls for body##

There are a lot of null checks happening related to request and response bodies. What I would like to suggest is that null body not be allowed. We can have an empty body (request will have a header of content-length=0), but not null.

My question: Am I going to step on toes if I try to do something about this? I am going to be reworking the bodies in requests anyway, so it seems like maybe the time to adjust this is now.

I am going to continue pursuing my 3rd iteration using my preferred approaches - please get back to me with any feedback!

@trumpetinc
Copy link
Author

I have submitted a PR with a branch of my 3rd design iteration. Branch currently covers streaming responses: #2754

I will work on streaming requests next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Change that proposes a non-backward compatible breaking change proposal Proposed Specification or API change
Projects
None yet
Development

No branches or pull requests

3 participants