Skip to content

Add KLUWrapper#296

Merged
jd-lara merged 46 commits into
mainfrom
claude/klu-sparse-rhs-wrapper-jLAmX
May 13, 2026
Merged

Add KLUWrapper#296
jd-lara merged 46 commits into
mainfrom
claude/klu-sparse-rhs-wrapper-jLAmX

Conversation

@jd-lara
Copy link
Copy Markdown
Member

@jd-lara jd-lara commented Apr 28, 2026

This PR is important to reduce issues with KLU allowing for multi entry

claude and others added 8 commits April 25, 2026 17:54
Adds a small allocation-aware wrapper over libklu (provided via
SuiteSparse_jll) tailored to the access patterns of this package:

- KLULinSolveCache splits symbolic and numeric factorization, supports
  numeric_refactor! that reuses the analysis and the prior numeric struct,
  and is non-allocating after construction.
- solve!/tsolve! call libklu's native dense multi-RHS routines directly.
- solve_sparse!/solve_sparse provide a sparse-RHS path: B's columns are
  scattered into a small dense scratch in chunks instead of densifying the
  full N x M RHS up front. Has a skip_empty option for RHSs where most
  columns are structurally empty (Ward reduction, Woodbury kernel).
- Real (Float64) and complex (ComplexF64) paths share the same struct via
  type-dispatched ccall helpers (klu_l_* and klu_zl_*).

This replaces the use of KLU.jl elsewhere in the package. KLU.jl is removed
from Project.toml; SuiteSparse_jll is added in its place.

https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
Replaces KLU.KLUFactorization with the new KLULinSolveCache across the
matrix types and uses solve_sparse! for the multi-RHS, structurally
sparse code paths:

- ABA_Matrix.K is now KLULinSolveCache{Float64}; factorize/is_factorized
  updated accordingly. DC_ABA_Matrix_Factorized type alias updated.
- _calculate_PTDF_matrix_KLU uses solve_sparse! on BA[valid_ix, :],
  scattering the structurally-sparse RHS columns instead of densifying
  the full (buscount-nref) x linecount matrix.
- _calculate_LODF_matrix_KLU passes transpose(a)[valid_ix, :] directly
  to solve_sparse!, removing the original zeros(buscount, buscount)
  intermediate.
- VirtualPTDF.K, VirtualLODF.K, VirtualMODF.K all hold a
  KLULinSolveCache{Float64}; per-row solve! calls remain non-allocating.
- DC_vPTDF_Matrix is left unconstrained on K so AppleAccelerate
  factorizations still satisfy the alias.

https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
Ward reduction:
- The boundary-bus identity solve becomes a single solve_sparse! call on
  a sparse identity matrix instead of a per-bus loop with one RHS each.
- The y_eq computation switches to solve_sparse(...; skip_empty=true) on
  y_eb. Most external buses are not adjacent to any boundary bus, so the
  RHS columns are largely empty and the skip-empty short-circuit avoids
  redundant solves.

Tests:
- Real and complex round-trip and refactor coverage.
- solve_sparse! agrees with the dense-RHS path.
- skip_empty produces zero columns for empty RHS columns.
- solve_sparse! into a view writes only the targeted rows.

https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
Simplifications:
- Drop redundant n field from KLULinSolveCache; derive from colptr.
- Drop _decrement!/_increment! helpers in favour of broadcast `.+= 1` /
  `.-= 1`.
- Rename finalize! -> Base.finalize so it composes with stdlib instead of
  shadowing Base.finalize through the module's exports.
- Drop solve_w_refinement (no in-tree caller); add Base.\\ on the cache
  for symmetry with KLU.KLUFactorization.
- Unify the two tsolve! methods via a _tsolve_call helper that ignores
  the conjugate flag on the real path.
- Drop block= and skip_empty= keywords from solve_sparse!. Always pack
  non-empty columns into a single dense scratch and dispatch one libklu
  multi-RHS solve. Empty RHS columns are zeroed in the output without a
  solve, in all cases.
- Throw stdlib exception types (SingularException, OutOfMemoryError,
  ArgumentError, OverflowError) from klu_throw to match the conventions
  used in SparseArrays.CHOLMOD and elsewhere in PNM.
- Refactor _calculate_PTDF_matrix_KLU: collapse the duplicated dist_slack
  branches into a single solve plus a slack-distribution post-step.
- Strip migration-narrative comments from the consumer call sites.

New: KLULinSolvePool
- A small pool of independent KLULinSolveCache workers, each holding its
  own factorization of the same matrix. KLU's numeric struct and Common
  status field are mutated by klu_solve, so safe parallel use needs one
  factor copy per worker.
- API: KLULinSolvePool(A; nworkers), with_worker(f, pool) -> result,
  acquire!/release!, numeric_refactor!(pool, A), Base.finalize.
- Plumbing into VirtualMODF (per-worker scratch + cache locks) is staged
  for the next commit.

https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
VirtualMODF.K is now a KLULinSolvePool{Float64} sized at construction time
to nworkers (defaults to Threads.nthreads()). Per-worker scratch buffers
(work_ba_col, temp_data) are stored as Vector{Vector{Float64}} indexed by
the worker handle returned from with_worker.

Woodbury kernel refactor:
- Split _compute_woodbury_factors / _apply_woodbury_correction into
  pure-data _impl functions that take cache + scratch + data arrays
  explicitly, plus mat-typed outer wrappers.
- VirtualPTDF wrapper uses its single shared scratch (existing single-cache
  behavior).
- VirtualMODF wrapper uses with_worker to acquire a per-worker cache and
  scratch.

Cache locking:
- Add ReentrantLocks for woodbury_cache and row_caches.
- _get_woodbury_factors and _get_or_create_row_cache use double-checked
  locking so concurrent miss-then-fill is correct.
- clear_caches! / clear_all_caches! / Base.getindex acquire the locks.

Tests:
- New test/test_virtual_threaded.jl: uses Threads.@Spawn for VirtualPTDF,
  VirtualLODF, and VirtualMODF. The VirtualMODF case spawns one task per
  (monitored, contingency) work item and validates parallel results
  match a serial reference. VirtualPTDF/VirtualLODF use a single-task
  @Spawn since their K and scratch are still shared (not yet pool-backed);
  the comments call this out explicitly.
- Tests in test/test_klu_wrapper.jl exercise KLULinSolvePool: basic
  with_worker, concurrent solves via @threads on a 4-worker pool, and
  numeric_refactor!.

https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd
@jd-lara jd-lara requested review from Copilot and luke-kiernan April 28, 2026 00:15
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 28, 2026

Performance Results

Precompile Time

Main This Branch Delta
2.176 s 2.213 s +1.7%

Execution Time

Test Main This Branch Delta
matpower_ACTIVSg2000_sys-Build PTDF First 2.187 s 1.807 s -17.4%
matpower_ACTIVSg2000_sys-Build PTDF Second 115.8 ms 186.6 ms +61.1%
matpower_ACTIVSg2000_sys-Build Ybus First 15.0 ms 14.5 ms -3.4%
matpower_ACTIVSg2000_sys-Build Ybus Second 13.7 ms 13.1 ms -4.8%
matpower_ACTIVSg2000_sys-Build LODF First 164.8 ms 528.8 ms +220.9%
matpower_ACTIVSg2000_sys-Build LODF Second 290.1 ms 175.8 ms -39.4%
matpower_ACTIVSg2000_sys-Build VirtualMODF First 4.112 s 4.554 s +10.8%
matpower_ACTIVSg2000_sys-Build VirtualMODF Second 203.5 ms 677.0 ms +232.6%
matpower_ACTIVSg2000_sys-VirtualMODF Query 10 rows 484.6 ms 493.9 ms +1.9%
matpower_ACTIVSg2000_sys-Radial network reduction First 455.2 ms 444.2 ms -2.4%
matpower_ACTIVSg2000_sys-Radial network reduction Second 0.7 ms 0.7 ms +1.6%
matpower_ACTIVSg2000_sys-Degree two network reduction First 1.739 s 1.691 s -2.8%
matpower_ACTIVSg2000_sys-Degree two network reduction Second 1.1 ms 1.1 ms -5.5%
Base_Eastern_Interconnect_515GW-Build Ybus First 3.53 s 3.352 s -5.0%
Base_Eastern_Interconnect_515GW-Build Ybus Second 3.253 s 3.263 s +0.3%
Base_Eastern_Interconnect_515GW-Radial network reduction First 42.4 ms 123.4 ms +190.7%
Base_Eastern_Interconnect_515GW-Radial network reduction Second 44.7 ms 36.7 ms -17.9%
Base_Eastern_Interconnect_515GW-Degree two network reduction First 362.3 ms 358.0 ms -1.2%
Base_Eastern_Interconnect_515GW-Degree two network reduction Second 48.7 ms 43.4 ms -10.7%

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the dependency on KLU.jl with an in-repo KLUWrapper built on SuiteSparse_jll and wires it through PTDF/LODF/MODF/Ward workflows, adding a worker-pool abstraction to support thread-safe parallel solves and new tests to validate correctness under concurrency.

Changes:

  • Introduce src/KLUWrapper/ with KLULinSolveCache (cached factorization) and KLULinSolvePool (thread-safe parallel solves) plus dense/sparse RHS solve helpers.
  • Refactor PTDF/LODF/Virtual* and Ward reduction codepaths to use klu_factorize, solve!, and solve_sparse!/solve_sparse instead of KLU.jl.
  • Add tests covering wrapper correctness, sparse RHS behavior, pool behavior, and threaded Virtual* access; bump package version and switch dependency to SuiteSparse_jll.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/test_virtual_threaded.jl Adds multithreaded regression tests for VirtualPTDF/VirtualLODF (single-task) and VirtualMODF (pool-backed parallelism).
test/test_klu_wrapper.jl Adds comprehensive unit tests for cache/pool factorization, dense + sparse RHS solves, refactor/reset paths, and allocation expectations.
src/woodbury_kernel.jl Refactors Woodbury computation into pure-data impl functions and adds dispatchers for VirtualPTDF vs pool-backed VirtualMODF.
src/ward_reduction.jl Switches Ward reduction solves to KLUWrapper and uses sparse-RHS solve for boundary columns.
src/virtual_ptdf_calculations.jl Updates VirtualPTDF to accept KLULinSolveCache and routes solves through the wrapper abstraction.
src/virtual_modf_calculations.jl Makes VirtualMODF pool-backed for parallel getindex, adds locks around caches, and threads Woodbury calls through with_worker.
src/virtual_lodf_calculations.jl Migrates VirtualLODF factorization/solves to KLULinSolveCache and uses in-place solve!.
src/ptdf_calculations.jl Replaces dense RHS KLU.solve! usage with solve_sparse! over selected BA rows.
src/lodf_calculations.jl Replaces dense RHS KLU.solve! usage with solve_sparse! for incidence-transpose RHS and updates denom solve.
src/PowerflowMatrixTypes.jl Updates type aliases to use KLULinSolveCache and relaxes VirtualPTDF’s factorization type parameter.
src/PowerNetworkMatrices.jl Includes/imports KLUWrapper APIs into the main module.
src/KLUWrapper/solve_sparse_rhs.jl Implements sparse RHS packing + block-chunked solve to bound working set.
src/KLUWrapper/solve_dense.jl Implements in-place dense solve and transpose solve; adds \\ for allocating solve.
src/KLUWrapper/pool.jl Adds KLULinSolvePool with Channel-based worker acquisition and refactor/reset logic.
src/KLUWrapper/klu_jll_bindings.jl Adds low-level ccall bindings for SuiteSparse_long KLU entry points and error mapping.
src/KLUWrapper/klu_cache.jl Implements KLULinSolveCache lifecycle, symbolic/numeric refactor, pattern checks, and scratch management.
src/KLUWrapper/KLUWrapper.jl Defines the KLUWrapper module and exports wrapper APIs.
src/BA_ABA_matrices.jl Updates ABA factorization storage and constructors to use klu_factorize.
Project.toml Bumps version and replaces KLU dependency with SuiteSparse_jll.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/KLUWrapper/pool.jl Outdated
Comment thread src/KLUWrapper/pool.jl Outdated
Comment thread src/KLUWrapper/pool.jl Outdated
Copy link
Copy Markdown
Collaborator

@luke-kiernan luke-kiernan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, will look more later.

Comment thread src/KLUWrapper/klu_cache.jl Outdated
Copy link
Copy Markdown
Collaborator

@luke-kiernan luke-kiernan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@jd-lara
Copy link
Copy Markdown
Member Author

jd-lara commented May 7, 2026

The final outcome of this PR is that we need a protection c717843 to avoid the issue in KLU reported here DrTimothyAldenDavis/SuiteSparse#1040

After much testing, we can't parallelize the solves even with multiple pool workers and this at least resolves the issue with having a PTDF and MODF

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 25 changed files in this pull request and generated 2 comments.

Comment thread src/KLUWrapper/KLUWrapper.jl Outdated
Comment thread src/KLUWrapper/klu_cache.jl Outdated
Copy link
Copy Markdown
Collaborator

@luke-kiernan luke-kiernan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Comment thread src/virtual_ptdf_modification.jl
Copy link
Copy Markdown
Collaborator

@luke-kiernan luke-kiernan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small nitpicks, but nothing major.

Will require companion PR in PF for compatibility.

Comment thread src/NetworkReductionData.jl Outdated
Comment thread src/BranchesParallel.jl Outdated
Copy link
Copy Markdown

@josephmckinsey josephmckinsey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how essentially every single call to KLU is globally serial, which is where all the run-time is, I'd be extremely surprised if a parallel call was ever faster. I'd definitely bet on it being thread-safe. I expect that we can relax some things eventually with enough testing.

Most of these comments are probably not important, but the testing one and _recover_factorization are probably important.

Comment thread src/KLUWrapper/KLUWrapper.jl Outdated
Comment thread src/KLUWrapper/klu_cache.jl
Comment thread src/KLUWrapper/klu_jll_bindings.jl
Comment thread src/KLUWrapper/solve_sparse_rhs.jl
Comment thread src/KLUWrapper/solve_sparse_rhs.jl Outdated
Comment thread src/BranchesParallel.jl
Tuple{Dict{Tuple{Int, Int}, Int64}, Dict{Int64, Int64}},
<:LinearAlgebra.Factorization,
}
K,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we can't make KLUinSolveCahce{Float64} a LinearAlgebra.Factorization like it is in KLU.jl?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need it?

Comment thread src/ptdf_calculations.jl Outdated
Comment thread src/solver_dispatch.jl Outdated
@info "Skipping: AppleAccelerate extension not loaded."
return
end
if Threads.nthreads() < 2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of Julia 1.12, Julia starts with 1 interactive thread and 1 default thread. It is unlikely this check is meaningful. In particular, I believe the github actions do not exercise multi-threading at all despite having no info about it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in c19b994

@jd-lara jd-lara requested a review from josephmckinsey May 13, 2026 19:07
Copy link
Copy Markdown
Collaborator

@luke-kiernan luke-kiernan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes since my last review look fine.

edit: Joseph's comments make me realize that just looking at "changes since last review" has its issues.

@jd-lara
Copy link
Copy Markdown
Member Author

jd-lara commented May 13, 2026

Changes since my last review look fine

Looking for the last tests from the PSI branch that consumes this

@jd-lara jd-lara merged commit 843ecad into main May 13, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants