-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow UMFPACK to be run with an alternative BLAS under the hood #453
Comments
The only really feasible way to do this is to set up a second entirely disjoint instance of LBT. This would have to be an entirely separate package and probably JLL though. Which seems pretty crazy to me rather than just setting up an Why exactly would we want to give users better performance on just UMFPACK when they could have it for every BLAS call already with LBT? |
Is the answer for LinearSolve.jl to just start doing |
If it's available for that plat yeah. I really do think we're shooting ourselves in the foot trying to pull out specific MKL functions, when the user probably wants it all. I'm not aware of any functionality where OpenBLAS beats MKL are you? Whether that should be done in LinSolve or whether we should explicitly inform the user to do it themselves idk |
At least for this issue, it would be good to have a simple example that shows where UMFPACK + OpenBLAS is slow. I assume you are using 1 openBLAS thread. |
The point is that the threading performance is what is bad, so it has bad scaling. There is currently no situation where UMFPACK benchmarks as the best method https://docs.sciml.ai/SciMLBenchmarksOutput/dev/LinearSolve/SparsePDE/ and benchmarks show very poor performance for OpenBLAS https://docs.sciml.ai/SciMLBenchmarksOutput/dev/LinearSolve/LUFactorization/ |
My question is - does it get better with openblas threading turned off? |
I'm not sure if UMFPACK is multithreaded @ViralBShah. I think it relies on threaded BLAS for performance. |
Here's Tim's answer on the topic: DrTimothyAldenDavis/SuiteSparse#432 (comment) Also, a parallel UMFPACK is coming (unrelated to this), and will also have the same performance issue. People who care about this sort of thing should just use MKL in their startup.jl. We can do an The only realistic solution is SuiteSparseMKL if you want it to all work automatically and by default. Even then, AMD has its own BLAS, and all this will fail on Mac ARM. Of course, I feel that the kind of person who wants faster LinearSolve performance, also generally wants faster overall BLAS performance, and it may be best to ask people to pick a better BLAS by default. Eventually we should have a global preference to set the BLAS, rather than doing it through startup.jl. |
Well, I did the LinearSolve.jl change and 🤷 I'm glad we got some information back. I got about 50 happy responses of people saying PDE stuff is magically faster, but 2 bug reports which effectively blow it all down SciML/LinearSolve.jl#427 and will cause it to have to revert. Pretty sad, so this issue has at least been proven to be unsolvable without changes to the binaries and a cause of major (100x) slowdowns in many real-world use cases. And it's pretty clear that the coming parallel UMFPACK will be super good but probably still be slower than KLU if you use OpenBLAS, it's that bad 🤷. So, we really do need a way to build with MKL without LBT since we cannot use the global trigger in general. The other path would be to actually fix MKL.jl to cover the whole interface (i.e. fixing JuliaLinearAlgebra/MKL.jl#138) and then having LinearAlgebra.jl do this. LinearAlgebra.jl should just check the CPU and load the "right" BLAS.
Matrix multiplications on AMD Epyc chips. That came up in two responses. Everyone else got either nothing changed or a speedup.
I think that is the case.
We should throw a warning if anyone does |
I think those issues are addressable and we should go about fixing them. This is probably the first large scale push for MKL and it is not surprising it uncovered more issues. Incomplete interface issues should be filed in libblastrampoline. We had picked every single thing we could find (but naturally there might be a few missed), and we even allow for BLAS extensions. Packages that are not linked against LBT probably should be recompiled. However, if they call openblas directly, it really shouldn't be an issue. The SLICOT one is a bit puzzling, but I'll try to look into that. |
Indeed, this was just the first large attempt to break the shackles of OpenBLAS. We will lick our wounds, fix up a few things, and come back to it with a improved packages in a bit. I personally want LinearAlgebra to start shipping MKL on appropriate CPUs within the next year |
Part of the fix might be this: JuliaLinearAlgebra/MKL.jl#140 |
The OpenBLAS perf issue is resolved and also backported. For the SuiteSparse that ships as an stdlib, we won't be able to do this. If you really want it, maybe there can be an external package, but I have no idea how reliably that will work. |
In LinearSolve.jl we directly use MKL_jll.jl in order to avoid OpenBLAS but not use lbt because we don't want to override the user's global state. However, this doesn't change UMPACK, and thus UMPACK is rather slow in comparison to everything else. We'd be hoping to just swap the underlying BLAS without having to make a global change in order to get the performance how we want it without effecting global state.
The text was updated successfully, but these errors were encountered: