The numba just-in-time compilation offers a way to speed up numpy-only functions, reduce memory usage (because arrays do not have to be copied with every operation), and allow for multi-threaded parallelism (i.e. releasing the GIL). It's not clear how much benefit we'll get from applying JIT techniques, since most of the functions are already vectorized. However, I think it's worth exploring.
Here's the strategy:
- Create notebooks that run functions on test data, with and without the jit, and time the execution.
- Begin with the lowest level (
mdx2.utils) and move up to higher levels (mdx2.data, etc).
- Refactor as needed to create reusable compiled functions and remove python object dependencies.
- Consider rewriting scipy or pandas code using numpy, and applying jit techniques
- Test in multiple environments (laptop (Apple Silicon), cbsuando2 (AMD), eig16m-analysis (Intel)).
For now, these testing notebooks will not be committed to the repo, however we could consider adding them.
The numba just-in-time compilation offers a way to speed up numpy-only functions, reduce memory usage (because arrays do not have to be copied with every operation), and allow for multi-threaded parallelism (i.e. releasing the GIL). It's not clear how much benefit we'll get from applying JIT techniques, since most of the functions are already vectorized. However, I think it's worth exploring.
Here's the strategy:
mdx2.utils) and move up to higher levels (mdx2.data, etc).For now, these testing notebooks will not be committed to the repo, however we could consider adding them.