Reduce WebSocket buffer slicing overhead #10601

bdraco · 2025-03-19T20:14:47Z

What do these changes do?

Use a const unsigned char * for the buffer (Cython will automatically extract is using __Pyx_PyBytes_AsUString) as its a lot faster than copying around PyBytes objects. We do need to be careful that all slices are bounded and we bound check everything to make sure we do not do an out of bounds read since Cython does not bounds check C strings.

I checked that all accesses to buf_cstr are proceeded by a bounds check but it would be good to get another set of eyes on that to verify in the self._state == READ_PAYLOAD block that we will never try to read out of bounds.

Are there changes in behavior for the user?

performance improvement

Is it a substantial burden for the maintainers to support this?

no

There is a small risk that someone could remove a bounds check in the future and create a memory safety issue, however in this case its likely we would already be trying to read data that wasn't there if we are missing the bounds checking so the pure python version would throw if we are testing properly.

Use a `const unsigned char *` for the buffer as its a lot faster than copying around PyBytes objects. We do need to be careful that all slices are bounded and we boundchecks everything to make sure we do not do an out of bounds read. I checked that all accesses to buf_cstr are preceeded by a boundchecks but it would be good to get another set of eyes on that to verify in the `self._state == READ_PAYLOAD` that we will never try to read out of bounds.

codspeed-hq · 2025-03-19T20:21:30Z

CodSpeed Performance Report

Merging #10601 will improve performances by 14.5%

_{Comparing websocket_buffer_slicing (e274527) with master (cb6e95e)}

Summary

⚡ 1 improvements
✅ 46 untouched benchmarks

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`test_read_one_hundred_websocket_text_messages[pyloop]`	297.5 µs	259.8 µs	+14.5%

codecov · 2025-03-19T20:25:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.71%. Comparing base (4f45448) to head (e274527).
Report is 16 commits behind head on master.

✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #10601   +/-   ##
=======================================
  Coverage   98.71%   98.71%           
=======================================
  Files         125      125           
  Lines       37366    37369    +3     
  Branches     2064     2064           
=======================================
+ Hits        36884    36887    +3     
  Misses        335      335           
  Partials      147      147

Flag	Coverage Δ
CI-GHA	`98.58% <100.00%> (+<0.01%)`	⬆️
OS-Linux	`98.25% <100.00%> (+<0.01%)`	⬆️
OS-Windows	`96.19% <0.00%> (-0.01%)`	⬇️
OS-macOS	`97.36% <0.00%> (-0.01%)`	⬇️
Py-3.10.11	`97.26% <0.00%> (-0.02%)`	⬇️
Py-3.10.16	`97.81% <100.00%> (-0.01%)`	⬇️
Py-3.11.11	`97.90% <100.00%> (+<0.01%)`	⬆️
Py-3.11.9	`97.36% <0.00%> (+<0.01%)`	⬆️
Py-3.12.9	`98.35% <100.00%> (+<0.01%)`	⬆️
Py-3.13.2	`98.34% <100.00%> (+<0.01%)`	⬆️
Py-3.9.13	`97.15% <0.00%> (-0.01%)`	⬇️
Py-3.9.21	`97.68% <100.00%> (+0.03%)`	⬆️
Py-pypy7.3.16	`85.42% <100.00%> (-8.10%)`	⬇️
VM-macos	`97.36% <0.00%> (-0.01%)`	⬇️
VM-ubuntu	`98.25% <100.00%> (+<0.01%)`	⬆️
VM-windows	`96.19% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

aiohttp/_websocket/reader_py.py

Co-authored-by: Sam Bull <[email protected]>

patchback · 2025-03-30T21:11:14Z

Backport to 3.11: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.11/f7cac7e63f18691e4261af353e84f9073b16624a/pr-10601

Backported as #10639

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

## What do these changes do? Use a `const unsigned char *` for the buffer (Cython will automatically extract is using `__Pyx_PyBytes_AsUString`) as its a lot faster than copying around `PyBytes` objects. We do need to be careful that all slices are bounded and we bound check everything to make sure we do not do an out of bounds read since Cython does not bounds check C strings. I checked that all accesses to `buf_cstr` are proceeded by a bounds check but it would be good to get another set of eyes on that to verify in the `self._state == READ_PAYLOAD` block that we will never try to read out of bounds. <img width="376" alt="Screenshot 2025-03-19 at 10 21 54 AM" src="https://github.com/user-attachments/assets/a340ffa2-f09b-4aff-a4f7-c487dae186c8" /> ## Are there changes in behavior for the user? performance improvement ## Is it a substantial burden for the maintainers to support this? no There is a small risk that someone could remove a bounds check in the future and create a memory safety issue, however in this case its likely we would already be trying to read data that wasn't there if we are missing the bounds checking so the pure python version would throw if we are testing properly. --------- Co-authored-by: Sam Bull <[email protected]> (cherry picked from commit f7cac7e)

patchback · 2025-03-30T21:11:22Z

Backport to 3.12: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.12/f7cac7e63f18691e4261af353e84f9073b16624a/pr-10601

Backported as #10640

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

## What do these changes do? Use a `const unsigned char *` for the buffer (Cython will automatically extract is using `__Pyx_PyBytes_AsUString`) as its a lot faster than copying around `PyBytes` objects. We do need to be careful that all slices are bounded and we bound check everything to make sure we do not do an out of bounds read since Cython does not bounds check C strings. I checked that all accesses to `buf_cstr` are proceeded by a bounds check but it would be good to get another set of eyes on that to verify in the `self._state == READ_PAYLOAD` block that we will never try to read out of bounds. <img width="376" alt="Screenshot 2025-03-19 at 10 21 54 AM" src="https://github.com/user-attachments/assets/a340ffa2-f09b-4aff-a4f7-c487dae186c8" /> ## Are there changes in behavior for the user? performance improvement ## Is it a substantial burden for the maintainers to support this? no There is a small risk that someone could remove a bounds check in the future and create a memory safety issue, however in this case its likely we would already be trying to read data that wasn't there if we are missing the bounds checking so the pure python version would throw if we are testing properly. --------- Co-authored-by: Sam Bull <[email protected]> (cherry picked from commit f7cac7e)

…verhead (#10640) **This is a backport of PR #10601 as merged into master (f7cac7e).**  ## What do these changes do? Use a `const unsigned char *` for the buffer (Cython will automatically extract is using `__Pyx_PyBytes_AsUString`) as its a lot faster than copying around `PyBytes` objects. We do need to be careful that all slices are bounded and we bound check everything to make sure we do not do an out of bounds read since Cython does not bounds check C strings. I checked that all accesses to `buf_cstr` are proceeded by a bounds check but it would be good to get another set of eyes on that to verify in the `self._state == READ_PAYLOAD` block that we will never try to read out of bounds. <img width="376" alt="Screenshot 2025-03-19 at 10 21 54 AM" src="https://github.com/user-attachments/assets/a340ffa2-f09b-4aff-a4f7-c487dae186c8" /> ## Are there changes in behavior for the user? performance improvement ## Is it a substantial burden for the maintainers to support this? no There is a small risk that someone could remove a bounds check in the future and create a memory safety issue, however in this case its likely we would already be trying to read data that wasn't there if we are missing the bounds checking so the pure python version would throw if we are testing properly. Co-authored-by: J. Nick Koston <[email protected]>

…verhead (#10639) **This is a backport of PR #10601 as merged into master (f7cac7e).**  ## What do these changes do? Use a `const unsigned char *` for the buffer (Cython will automatically extract is using `__Pyx_PyBytes_AsUString`) as its a lot faster than copying around `PyBytes` objects. We do need to be careful that all slices are bounded and we bound check everything to make sure we do not do an out of bounds read since Cython does not bounds check C strings. I checked that all accesses to `buf_cstr` are proceeded by a bounds check but it would be good to get another set of eyes on that to verify in the `self._state == READ_PAYLOAD` block that we will never try to read out of bounds. <img width="376" alt="Screenshot 2025-03-19 at 10 21 54 AM" src="https://github.com/user-attachments/assets/a340ffa2-f09b-4aff-a4f7-c487dae186c8" /> ## Are there changes in behavior for the user? performance improvement ## Is it a substantial burden for the maintainers to support this? no There is a small risk that someone could remove a bounds check in the future and create a memory safety issue, however in this case its likely we would already be trying to read data that wasn't there if we are missing the bounds checking so the pure python version would throw if we are testing properly. Co-authored-by: J. Nick Koston <[email protected]>

changelog

498175a

psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Mar 19, 2025

bdraco added backport-3.11 Trigger automatic backporting to the 3.11 release branch by Patchback robot backport-3.12 Trigger automatic backporting to the 3.12 release branch by Patchback robot labels Mar 19, 2025

bdraco marked this pull request as ready for review March 19, 2025 21:03

bdraco requested review from webknjaz and asvetlov as code owners March 19, 2025 21:03

Dreamsorcerer reviewed Mar 20, 2025

View reviewed changes

aiohttp/_websocket/reader_py.py Show resolved Hide resolved

Update aiohttp/_websocket/reader_py.py

e274527

Co-authored-by: Sam Bull <[email protected]>

bdraco merged commit f7cac7e into master Mar 30, 2025
40 checks passed

bdraco deleted the websocket_buffer_slicing branch March 30, 2025 21:11

patchback bot mentioned this pull request Mar 30, 2025

[PR #10601/f7cac7e6 backport][3.11] Reduce WebSocket buffer slicing overhead #10639

Merged

patchback bot mentioned this pull request Mar 30, 2025

[PR #10601/f7cac7e6 backport][3.12] Reduce WebSocket buffer slicing overhead #10640

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce WebSocket buffer slicing overhead #10601

Reduce WebSocket buffer slicing overhead #10601

bdraco commented Mar 19, 2025 •

edited

Loading

codspeed-hq bot commented Mar 19, 2025 •

edited

Loading

codecov bot commented Mar 19, 2025 •

edited

Loading

patchback bot commented Mar 30, 2025 •

edited

Loading

patchback bot commented Mar 30, 2025 •

edited

Loading

Reduce WebSocket buffer slicing overhead #10601

Reduce WebSocket buffer slicing overhead #10601

Conversation

bdraco commented Mar 19, 2025 • edited Loading

What do these changes do?

Are there changes in behavior for the user?

Is it a substantial burden for the maintainers to support this?

codspeed-hq bot commented Mar 19, 2025 • edited Loading

CodSpeed Performance Report

Merging #10601 will improve performances by 14.5%

Summary

Benchmarks breakdown

codecov bot commented Mar 19, 2025 • edited Loading

Codecov Report

patchback bot commented Mar 30, 2025 • edited Loading

Backport to 3.11: 💚 backport PR created

patchback bot commented Mar 30, 2025 • edited Loading

Backport to 3.12: 💚 backport PR created

bdraco commented Mar 19, 2025 •

edited

Loading

codspeed-hq bot commented Mar 19, 2025 •

edited

Loading

codecov bot commented Mar 19, 2025 •

edited

Loading

patchback bot commented Mar 30, 2025 •

edited

Loading

patchback bot commented Mar 30, 2025 •

edited

Loading