Skip to content

Speed up build_ellipse_model #2046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

taranu
Copy link

@taranu taranu commented May 20, 2025

Like #1288 (which seems abandoned), this speeds up build_ellipse_model to address #1168 . However, it is entirely written in Cython and does not require a separate shared library. In local testing on Python 3.12, it's about 130x faster than the current pure Python implementation. I also made the tolerance for one of the unit tests much stricter, because I found that it was passing even when the output was incorrect.

There is a potentially premature optimization to specialize the function depending on whether harmonic arrays are passed or not. The C code generated on my system looked correct, but I didn't actually find any difference in performance between high_harmonics=False and True.

I also attempted to parallelize it with Cython's prange. However, in my testing on multiple Linux systems, I found it never actually used more than one thread. Additionally, I couldn't figure out how to allow for a num_threads=None value in the Cython build_ellipse_model_c function, since it's typed as an int. Thus, I'm making this a draft PR to start. If someone more versed in Cython/prange could have a look at that, it would be helpful. Otherwise, the parallel support could simply be dropped.

@taranu taranu force-pushed the cython-build-ellipse-model branch from 4c27a18 to acbef6d Compare May 22, 2025 04:28
@taranu taranu force-pushed the cython-build-ellipse-model branch from acbef6d to c1f4e2f Compare May 22, 2025 18:50
@taranu
Copy link
Author

taranu commented May 23, 2025

I tested this on two different systems and found that num_threads does nothing because it was compiling without -fopenmp and not linking against libomp (or libgomp). After manually recompiling, parallelism works fine - or at least, runtimes are about ~3.2x faster with 4 threads.

I see that there are openmp helpers in extension-helpers, which is a requirement and enabled in pyproject.toml here, but I'm unclear on how to encourage pip install to add the relevant compiler flags. @astrofrog, could you advise?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant