Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in transfer encoding handling between the C and Python HTTP parsers #4436

Closed
JustAnotherArchivist opened this issue Dec 11, 2019 · 4 comments · Fixed by #8823
Closed
Labels

Comments

@JustAnotherArchivist
Copy link
Contributor

Long story short

The C and Python parsers in aiohttp do not handle transfer encoding the same way, and neither behaves correctly. The C parser simply discards any value that isn't chunked and therefore doesn't handle things like gzip, chunked (see #4435). The Python parser accepts any value that contains chunked, including invalid values like notchunked, and parses (or attempts to parse) the payload as a chunked response.

Expected behaviour

The C and Python implementations behave the same and according to the HTTP specification, i.e. presumably throw an error on TE values they do not support or that are invalid (though the standard does not explicitly mention that as far as I can see).

Actual behaviour

The two implementations differ and do not conform to the standard.

Steps to reproduce

A simple server and client illustrating this issue can be found in this gist. I wrote a single test client and server for both this issue and #4435; only the first line is relevant to this issue, and the others are not included in the output below.

Behaviour of the C parser:

> python3 client.py | head -1
b'1\r\nT\r\n3\r\nest\r\n0\r\n\r\n'

Behaviour of the Python parser:

> AIOHTTP_NO_EXTENSIONS=1 python3 client.py | head -1
b'Test'

Your environment

I tested this with aiohttp 2.3.10 and Python 3.6.9 on Debian, but based on the current aiohttp code, the behaviour should still be the same on the current versions.

@asvetlov
Copy link
Member

That's sad.
Would you provide a patch?

@JustAnotherArchivist
Copy link
Contributor Author

I can do that for the Python implementation, but it'd arguably be more important to fix the C parser since that's what's presumably used most of the time. The C parser issue mentioned above has been reported upstream 3.5 years ago, and a patch is not in sight. I'm not sure if it can be fixed/worked around entirely in the Cython wrapper.

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Aug 21, 2024

Looking at this, the problem description is not quite correct and actually the C parser is working correctly (maybe it didn't at the time, but the output still matches).

If a Transfer-Encoding header field is present in a response and the chunked transfer coding is not the final encoding, the message body length is determined by reading the connection until it is closed by the server.

If a Transfer-Encoding header field is present in a request and the chunked transfer coding is not the final encoding, the message body length cannot be determined reliably; the server MUST respond with the 400 (Bad Request) status code and then close the connection.
https://www.rfc-editor.org/rfc/rfc9112#section-6.3-2.4.2

The test here only looks at the client side, so the first paragraph is relevant here (and we'll need another test for the server side). So, the test uses notchunked and the C parser correctly proceeds to read the rest of the body and therefore provides the output shown in the description. If you change it to not, chunked, then it correctly parses it as chunked and the output is b'Test'.

In both cases the Python parser gives a Request has invalid Transfer-Encoding error, which is incorrect. This code was probably added for the server-side part (i.e. the second paragraph of the spec), so we need to fix the client side (request) parser.

@Dreamsorcerer
Copy link
Member

OK, created the tests. The C parser is correct in all 4 cases, Python parser is wrong in 3 of 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants