Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement speedups with rust v2 #438

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

carsonburr
Copy link
Contributor

pip install -e . in a virtualenv should build src/markupsafe/_rust_speedups.???.so, assuming you have Rust installed.

python bench.py to run all benchmarks, rust included. Here's the results on my machine:

$ python bench.py

short escape native: Mean +- std dev: 656 ns +- 12 ns
short escape speedups: Mean +- std dev: 417 ns +- 7 ns
short escape rust_speedups: Mean +- std dev: 522 ns +- 15 ns

long escape native: Mean +- std dev: 17.3 us +- 0.2 us
long escape speedups: Mean +- std dev: 7.79 us +- 0.14 us
long escape rust_speedups: Mean +- std dev: 6.71 us +- 0.08 us

short plain native: Mean +- std dev: 505 ns +- 9 ns
short plain speedups: Mean +- std dev: 349 ns +- 5 ns
short plain rust_speedups: Mean +- std dev: 401 ns +- 4 ns

long plain native: Mean +- std dev: 17.2 us +- 0.1 us
long plain speedups: Mean +- std dev: 7.77 us +- 0.10 us
long plain rust_speedups: Mean +- std dev: 6.73 us +- 0.15 us

long suffix native: Mean +- std dev: 134 us +- 1 us
long suffix speedups: Mean +- std dev: 131 us +- 1 us
long suffix rust_speedups: Mean +- std dev: 58.3 us +- 1.2 us

@davidism
Copy link
Member

davidism commented Apr 23, 2024

I'll have to run it on my machine for an exact comparison, but those are some good performance numbers compared to the C for the long benchmarks.

@carsonburr
Copy link
Contributor Author

Simplified the rust speedups to use a lookup table instead of relying on auto-vectorized simd. The advantages of this are that it's not as messy, doesn't use advanced rust features, doesn't use unsafe*, and slightly faster for the workloads in bench.py. I'm sure this still could be simd-accelerated, but not portably.

*potentially eating the conversion cost if the PyString is stored as utf-16 or unicode 32-bit

debug = true

[dependencies]
pyo3 = "0.22.2"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidism I guess if you want abi3, can just enable the feature:

Suggested change
pyo3 = "0.22.2"
pyo3 = { version = "0.22.2", features = ["abi3"] }

@davidism
Copy link
Member

davidism commented Oct 6, 2024

@davidhewitt thanks for looking at this. In #461 we're setting up wheel builds for 313 and 313t (free threading). Say we were to add the features = ["abi3"] suggestion you made above. How would we deal with the free threading build? Presumably, we'd offer an abi3 wheel, and then also a 313t wheel that's not abi3? How would we configure pyo3 and cibuildwheel to handle both of those?

@davidhewitt
Copy link

For ease of use, currently at the moment if you have the abi3 feature set but you're building on the freethreaded Python we'll ignore it and build for the freethreaded ABI. I understand in 3.14 there's possibility of a new stable abi which supports freethreading, so that might change in future.

I am not super familiar with cibuildwheel configuration 🙈, though I assume that the Rust build would work the same way as building a freethreaded C wheel.

@davidhewitt
Copy link

Though note also that PyO3's freethreaded support is not complete yet / keeping me up at night / should drop soon-ish in our 0.23 release.

@davidhewitt
Copy link

For what it's worth, we have now got good free-threading support in PyO3, and the abi3 / free-threaded interaction is as described; just build an abi3 wheel and a free-threaded wheel, and you're done.

See e.g. support added in bcrypt which also uses the abi3 feature.

@davidism
Copy link
Member

Do you know if cibuildwheel is adding support as well? PyCA uses their own wheel building infrastructure, so it's not immediately clear how I would adapt their configuration.

@davidism
Copy link
Member

davidism commented Feb 19, 2025

https://cibuildwheel.pypa.io/en/stable/options Looks like there is an option for it, CIBW_ENABLE=cpython-freethreading

Copy link

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and setuptools-rust tests building with cibuildwheel (though not abi3 or free-threaded specifically, those should just work due to setuptools configuration, and we test them elsewhere in the suite).

Comment on lines 9 to 11
"PyPy",
"Jython",
"GraalVM",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW we support PyPy and GraalVM, they should just work in PyO3 for you (or it's a bug for us to resolve).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we skip building for them is because the C extension version turned out to be slower than the pure Python version. Do you have any insight into whether this is the case for Rust? My understanding is that those implementations need to emulate Python's C API, which ends up being slower than the speedups their interpreters have.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair enough. Yes, Rust extensions are built atop the C API and rely on the same emulation / backdoors from their JITs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants