Add KLUWrapper#296
Conversation
Adds a small allocation-aware wrapper over libklu (provided via SuiteSparse_jll) tailored to the access patterns of this package: - KLULinSolveCache splits symbolic and numeric factorization, supports numeric_refactor! that reuses the analysis and the prior numeric struct, and is non-allocating after construction. - solve!/tsolve! call libklu's native dense multi-RHS routines directly. - solve_sparse!/solve_sparse provide a sparse-RHS path: B's columns are scattered into a small dense scratch in chunks instead of densifying the full N x M RHS up front. Has a skip_empty option for RHSs where most columns are structurally empty (Ward reduction, Woodbury kernel). - Real (Float64) and complex (ComplexF64) paths share the same struct via type-dispatched ccall helpers (klu_l_* and klu_zl_*). This replaces the use of KLU.jl elsewhere in the package. KLU.jl is removed from Project.toml; SuiteSparse_jll is added in its place. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
Replaces KLU.KLUFactorization with the new KLULinSolveCache across the
matrix types and uses solve_sparse! for the multi-RHS, structurally
sparse code paths:
- ABA_Matrix.K is now KLULinSolveCache{Float64}; factorize/is_factorized
updated accordingly. DC_ABA_Matrix_Factorized type alias updated.
- _calculate_PTDF_matrix_KLU uses solve_sparse! on BA[valid_ix, :],
scattering the structurally-sparse RHS columns instead of densifying
the full (buscount-nref) x linecount matrix.
- _calculate_LODF_matrix_KLU passes transpose(a)[valid_ix, :] directly
to solve_sparse!, removing the original zeros(buscount, buscount)
intermediate.
- VirtualPTDF.K, VirtualLODF.K, VirtualMODF.K all hold a
KLULinSolveCache{Float64}; per-row solve! calls remain non-allocating.
- DC_vPTDF_Matrix is left unconstrained on K so AppleAccelerate
factorizations still satisfy the alias.
https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
Ward reduction: - The boundary-bus identity solve becomes a single solve_sparse! call on a sparse identity matrix instead of a per-bus loop with one RHS each. - The y_eq computation switches to solve_sparse(...; skip_empty=true) on y_eb. Most external buses are not adjacent to any boundary bus, so the RHS columns are largely empty and the skip-empty short-circuit avoids redundant solves. Tests: - Real and complex round-trip and refactor coverage. - solve_sparse! agrees with the dense-RHS path. - skip_empty produces zero columns for empty RHS columns. - solve_sparse! into a view writes only the targeted rows. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
Simplifications: - Drop redundant n field from KLULinSolveCache; derive from colptr. - Drop _decrement!/_increment! helpers in favour of broadcast `.+= 1` / `.-= 1`. - Rename finalize! -> Base.finalize so it composes with stdlib instead of shadowing Base.finalize through the module's exports. - Drop solve_w_refinement (no in-tree caller); add Base.\\ on the cache for symmetry with KLU.KLUFactorization. - Unify the two tsolve! methods via a _tsolve_call helper that ignores the conjugate flag on the real path. - Drop block= and skip_empty= keywords from solve_sparse!. Always pack non-empty columns into a single dense scratch and dispatch one libklu multi-RHS solve. Empty RHS columns are zeroed in the output without a solve, in all cases. - Throw stdlib exception types (SingularException, OutOfMemoryError, ArgumentError, OverflowError) from klu_throw to match the conventions used in SparseArrays.CHOLMOD and elsewhere in PNM. - Refactor _calculate_PTDF_matrix_KLU: collapse the duplicated dist_slack branches into a single solve plus a slack-distribution post-step. - Strip migration-narrative comments from the consumer call sites. New: KLULinSolvePool - A small pool of independent KLULinSolveCache workers, each holding its own factorization of the same matrix. KLU's numeric struct and Common status field are mutated by klu_solve, so safe parallel use needs one factor copy per worker. - API: KLULinSolvePool(A; nworkers), with_worker(f, pool) -> result, acquire!/release!, numeric_refactor!(pool, A), Base.finalize. - Plumbing into VirtualMODF (per-worker scratch + cache locks) is staged for the next commit. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
VirtualMODF.K is now a KLULinSolvePool{Float64} sized at construction time
to nworkers (defaults to Threads.nthreads()). Per-worker scratch buffers
(work_ba_col, temp_data) are stored as Vector{Vector{Float64}} indexed by
the worker handle returned from with_worker.
Woodbury kernel refactor:
- Split _compute_woodbury_factors / _apply_woodbury_correction into
pure-data _impl functions that take cache + scratch + data arrays
explicitly, plus mat-typed outer wrappers.
- VirtualPTDF wrapper uses its single shared scratch (existing single-cache
behavior).
- VirtualMODF wrapper uses with_worker to acquire a per-worker cache and
scratch.
Cache locking:
- Add ReentrantLocks for woodbury_cache and row_caches.
- _get_woodbury_factors and _get_or_create_row_cache use double-checked
locking so concurrent miss-then-fill is correct.
- clear_caches! / clear_all_caches! / Base.getindex acquire the locks.
Tests:
- New test/test_virtual_threaded.jl: uses Threads.@Spawn for VirtualPTDF,
VirtualLODF, and VirtualMODF. The VirtualMODF case spawns one task per
(monitored, contingency) work item and validates parallel results
match a serial reference. VirtualPTDF/VirtualLODF use a single-task
@Spawn since their K and scratch are still shared (not yet pool-backed);
the comments call this out explicitly.
- Tests in test/test_klu_wrapper.jl exercise KLULinSolvePool: basic
with_worker, concurrent solves via @threads on a 4-worker pool, and
numeric_refactor!.
https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
Performance ResultsPrecompile Time
Execution Time
|
There was a problem hiding this comment.
Pull request overview
This PR replaces the dependency on KLU.jl with an in-repo KLUWrapper built on SuiteSparse_jll and wires it through PTDF/LODF/MODF/Ward workflows, adding a worker-pool abstraction to support thread-safe parallel solves and new tests to validate correctness under concurrency.
Changes:
- Introduce
src/KLUWrapper/withKLULinSolveCache(cached factorization) andKLULinSolvePool(thread-safe parallel solves) plus dense/sparse RHS solve helpers. - Refactor PTDF/LODF/Virtual* and Ward reduction codepaths to use
klu_factorize,solve!, andsolve_sparse!/solve_sparseinstead ofKLU.jl. - Add tests covering wrapper correctness, sparse RHS behavior, pool behavior, and threaded Virtual* access; bump package version and switch dependency to
SuiteSparse_jll.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_virtual_threaded.jl | Adds multithreaded regression tests for VirtualPTDF/VirtualLODF (single-task) and VirtualMODF (pool-backed parallelism). |
| test/test_klu_wrapper.jl | Adds comprehensive unit tests for cache/pool factorization, dense + sparse RHS solves, refactor/reset paths, and allocation expectations. |
| src/woodbury_kernel.jl | Refactors Woodbury computation into pure-data impl functions and adds dispatchers for VirtualPTDF vs pool-backed VirtualMODF. |
| src/ward_reduction.jl | Switches Ward reduction solves to KLUWrapper and uses sparse-RHS solve for boundary columns. |
| src/virtual_ptdf_calculations.jl | Updates VirtualPTDF to accept KLULinSolveCache and routes solves through the wrapper abstraction. |
| src/virtual_modf_calculations.jl | Makes VirtualMODF pool-backed for parallel getindex, adds locks around caches, and threads Woodbury calls through with_worker. |
| src/virtual_lodf_calculations.jl | Migrates VirtualLODF factorization/solves to KLULinSolveCache and uses in-place solve!. |
| src/ptdf_calculations.jl | Replaces dense RHS KLU.solve! usage with solve_sparse! over selected BA rows. |
| src/lodf_calculations.jl | Replaces dense RHS KLU.solve! usage with solve_sparse! for incidence-transpose RHS and updates denom solve. |
| src/PowerflowMatrixTypes.jl | Updates type aliases to use KLULinSolveCache and relaxes VirtualPTDF’s factorization type parameter. |
| src/PowerNetworkMatrices.jl | Includes/imports KLUWrapper APIs into the main module. |
| src/KLUWrapper/solve_sparse_rhs.jl | Implements sparse RHS packing + block-chunked solve to bound working set. |
| src/KLUWrapper/solve_dense.jl | Implements in-place dense solve and transpose solve; adds \\ for allocating solve. |
| src/KLUWrapper/pool.jl | Adds KLULinSolvePool with Channel-based worker acquisition and refactor/reset logic. |
| src/KLUWrapper/klu_jll_bindings.jl | Adds low-level ccall bindings for SuiteSparse_long KLU entry points and error mapping. |
| src/KLUWrapper/klu_cache.jl | Implements KLULinSolveCache lifecycle, symbolic/numeric refactor, pattern checks, and scratch management. |
| src/KLUWrapper/KLUWrapper.jl | Defines the KLUWrapper module and exports wrapper APIs. |
| src/BA_ABA_matrices.jl | Updates ABA factorization storage and constructors to use klu_factorize. |
| Project.toml | Bumps version and replaces KLU dependency with SuiteSparse_jll. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
luke-kiernan
left a comment
There was a problem hiding this comment.
One comment, will look more later.
|
The final outcome of this PR is that we need a protection c717843 to avoid the issue in KLU reported here DrTimothyAldenDavis/SuiteSparse#1040 After much testing, we can't parallelize the solves even with multiple pool workers and this at least resolves the issue with having a PTDF and MODF |
luke-kiernan
left a comment
There was a problem hiding this comment.
Two small nitpicks, but nothing major.
Will require companion PR in PF for compatibility.
josephmckinsey
left a comment
There was a problem hiding this comment.
Given how essentially every single call to KLU is globally serial, which is where all the run-time is, I'd be extremely surprised if a parallel call was ever faster. I'd definitely bet on it being thread-safe. I expect that we can relax some things eventually with enough testing.
Most of these comments are probably not important, but the testing one and _recover_factorization are probably important.
| Tuple{Dict{Tuple{Int, Int}, Int64}, Dict{Int64, Int64}}, | ||
| <:LinearAlgebra.Factorization, | ||
| } | ||
| K, |
There was a problem hiding this comment.
Any reason we can't make KLUinSolveCahce{Float64} a LinearAlgebra.Factorization like it is in KLU.jl?
| @info "Skipping: AppleAccelerate extension not loaded." | ||
| return | ||
| end | ||
| if Threads.nthreads() < 2 |
There was a problem hiding this comment.
As of Julia 1.12, Julia starts with 1 interactive thread and 1 default thread. It is unlikely this check is meaningful. In particular, I believe the github actions do not exercise multi-threading at all despite having no info about it.
Looking for the last tests from the PSI branch that consumes this |
This PR is important to reduce issues with KLU allowing for multi entry