Add KLUWrapper by jd-lara · Pull Request #296 · Sienna-Platform/PowerNetworkMatrices.jl

jd-lara · 2026-04-28T00:15:59Z

This PR is important to reduce issues with KLU allowing for multi entry

Adds a small allocation-aware wrapper over libklu (provided via SuiteSparse_jll) tailored to the access patterns of this package: - KLULinSolveCache splits symbolic and numeric factorization, supports numeric_refactor! that reuses the analysis and the prior numeric struct, and is non-allocating after construction. - solve!/tsolve! call libklu's native dense multi-RHS routines directly. - solve_sparse!/solve_sparse provide a sparse-RHS path: B's columns are scattered into a small dense scratch in chunks instead of densifying the full N x M RHS up front. Has a skip_empty option for RHSs where most columns are structurally empty (Ward reduction, Woodbury kernel). - Real (Float64) and complex (ComplexF64) paths share the same struct via type-dispatched ccall helpers (klu_l_* and klu_zl_*). This replaces the use of KLU.jl elsewhere in the package. KLU.jl is removed from Project.toml; SuiteSparse_jll is added in its place. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd

Replaces KLU.KLUFactorization with the new KLULinSolveCache across the matrix types and uses solve_sparse! for the multi-RHS, structurally sparse code paths: - ABA_Matrix.K is now KLULinSolveCache{Float64}; factorize/is_factorized updated accordingly. DC_ABA_Matrix_Factorized type alias updated. - _calculate_PTDF_matrix_KLU uses solve_sparse! on BA[valid_ix, :], scattering the structurally-sparse RHS columns instead of densifying the full (buscount-nref) x linecount matrix. - _calculate_LODF_matrix_KLU passes transpose(a)[valid_ix, :] directly to solve_sparse!, removing the original zeros(buscount, buscount) intermediate. - VirtualPTDF.K, VirtualLODF.K, VirtualMODF.K all hold a KLULinSolveCache{Float64}; per-row solve! calls remain non-allocating. - DC_vPTDF_Matrix is left unconstrained on K so AppleAccelerate factorizations still satisfy the alias. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd

Ward reduction: - The boundary-bus identity solve becomes a single solve_sparse! call on a sparse identity matrix instead of a per-bus loop with one RHS each. - The y_eq computation switches to solve_sparse(...; skip_empty=true) on y_eb. Most external buses are not adjacent to any boundary bus, so the RHS columns are largely empty and the skip-empty short-circuit avoids redundant solves. Tests: - Real and complex round-trip and refactor coverage. - solve_sparse! agrees with the dense-RHS path. - skip_empty produces zero columns for empty RHS columns. - solve_sparse! into a view writes only the targeted rows. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd

Simplifications: - Drop redundant n field from KLULinSolveCache; derive from colptr. - Drop _decrement!/_increment! helpers in favour of broadcast `.+= 1` / `.-= 1`. - Rename finalize! -> Base.finalize so it composes with stdlib instead of shadowing Base.finalize through the module's exports. - Drop solve_w_refinement (no in-tree caller); add Base.\\ on the cache for symmetry with KLU.KLUFactorization. - Unify the two tsolve! methods via a _tsolve_call helper that ignores the conjugate flag on the real path. - Drop block= and skip_empty= keywords from solve_sparse!. Always pack non-empty columns into a single dense scratch and dispatch one libklu multi-RHS solve. Empty RHS columns are zeroed in the output without a solve, in all cases. - Throw stdlib exception types (SingularException, OutOfMemoryError, ArgumentError, OverflowError) from klu_throw to match the conventions used in SparseArrays.CHOLMOD and elsewhere in PNM. - Refactor _calculate_PTDF_matrix_KLU: collapse the duplicated dist_slack branches into a single solve plus a slack-distribution post-step. - Strip migration-narrative comments from the consumer call sites. New: KLULinSolvePool - A small pool of independent KLULinSolveCache workers, each holding its own factorization of the same matrix. KLU's numeric struct and Common status field are mutated by klu_solve, so safe parallel use needs one factor copy per worker. - API: KLULinSolvePool(A; nworkers), with_worker(f, pool) -> result, acquire!/release!, numeric_refactor!(pool, A), Base.finalize. - Plumbing into VirtualMODF (per-worker scratch + cache locks) is staged for the next commit. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd

@Spawn

VirtualMODF.K is now a KLULinSolvePool{Float64} sized at construction time to nworkers (defaults to Threads.nthreads()). Per-worker scratch buffers (work_ba_col, temp_data) are stored as Vector{Vector{Float64}} indexed by the worker handle returned from with_worker. Woodbury kernel refactor: - Split _compute_woodbury_factors / _apply_woodbury_correction into pure-data _impl functions that take cache + scratch + data arrays explicitly, plus mat-typed outer wrappers. - VirtualPTDF wrapper uses its single shared scratch (existing single-cache behavior). - VirtualMODF wrapper uses with_worker to acquire a per-worker cache and scratch. Cache locking: - Add ReentrantLocks for woodbury_cache and row_caches. - _get_woodbury_factors and _get_or_create_row_cache use double-checked locking so concurrent miss-then-fill is correct. - clear_caches! / clear_all_caches! / Base.getindex acquire the locks. Tests: - New test/test_virtual_threaded.jl: uses Threads.@Spawn for VirtualPTDF, VirtualLODF, and VirtualMODF. The VirtualMODF case spawns one task per (monitored, contingency) work item and validates parallel results match a serial reference. VirtualPTDF/VirtualLODF use a single-task @Spawn since their K and scratch are still shared (not yet pool-backed); the comments call this out explicitly. - Tests in test/test_klu_wrapper.jl exercise KLULinSolvePool: basic with_worker, concurrent solves via @threads on a 4-worker pool, and numeric_refactor!. https://claude.ai/code/session_0128vLG5HzxrMTZYjE9oTGfd

github-actions · 2026-04-28T00:25:08Z

Performance Results

Precompile Time

Main	This Branch	Delta
2.176 s	2.213 s	+1.7%

Execution Time

Test	Main	This Branch	Delta
matpower_ACTIVSg2000_sys-Build PTDF First	2.187 s	1.807 s	-17.4%
matpower_ACTIVSg2000_sys-Build PTDF Second	115.8 ms	186.6 ms	+61.1%
matpower_ACTIVSg2000_sys-Build Ybus First	15.0 ms	14.5 ms	-3.4%
matpower_ACTIVSg2000_sys-Build Ybus Second	13.7 ms	13.1 ms	-4.8%
matpower_ACTIVSg2000_sys-Build LODF First	164.8 ms	528.8 ms	+220.9%
matpower_ACTIVSg2000_sys-Build LODF Second	290.1 ms	175.8 ms	-39.4%
matpower_ACTIVSg2000_sys-Build VirtualMODF First	4.112 s	4.554 s	+10.8%
matpower_ACTIVSg2000_sys-Build VirtualMODF Second	203.5 ms	677.0 ms	+232.6%
matpower_ACTIVSg2000_sys-VirtualMODF Query 10 rows	484.6 ms	493.9 ms	+1.9%
matpower_ACTIVSg2000_sys-Radial network reduction First	455.2 ms	444.2 ms	-2.4%
matpower_ACTIVSg2000_sys-Radial network reduction Second	0.7 ms	0.7 ms	+1.6%
matpower_ACTIVSg2000_sys-Degree two network reduction First	1.739 s	1.691 s	-2.8%
matpower_ACTIVSg2000_sys-Degree two network reduction Second	1.1 ms	1.1 ms	-5.5%
Base_Eastern_Interconnect_515GW-Build Ybus First	3.53 s	3.352 s	-5.0%
Base_Eastern_Interconnect_515GW-Build Ybus Second	3.253 s	3.263 s	+0.3%
Base_Eastern_Interconnect_515GW-Radial network reduction First	42.4 ms	123.4 ms	+190.7%
Base_Eastern_Interconnect_515GW-Radial network reduction Second	44.7 ms	36.7 ms	-17.9%
Base_Eastern_Interconnect_515GW-Degree two network reduction First	362.3 ms	358.0 ms	-1.2%
Base_Eastern_Interconnect_515GW-Degree two network reduction Second	48.7 ms	43.4 ms	-10.7%

Copilot

Pull request overview

This PR replaces the dependency on KLU.jl with an in-repo KLUWrapper built on SuiteSparse_jll and wires it through PTDF/LODF/MODF/Ward workflows, adding a worker-pool abstraction to support thread-safe parallel solves and new tests to validate correctness under concurrency.

Changes:

Introduce src/KLUWrapper/ with KLULinSolveCache (cached factorization) and KLULinSolvePool (thread-safe parallel solves) plus dense/sparse RHS solve helpers.
Refactor PTDF/LODF/Virtual* and Ward reduction codepaths to use klu_factorize, solve!, and solve_sparse!/solve_sparse instead of KLU.jl.
Add tests covering wrapper correctness, sparse RHS behavior, pool behavior, and threaded Virtual* access; bump package version and switch dependency to SuiteSparse_jll.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
test/test_virtual_threaded.jl	Adds multithreaded regression tests for VirtualPTDF/VirtualLODF (single-task) and VirtualMODF (pool-backed parallelism).
test/test_klu_wrapper.jl	Adds comprehensive unit tests for cache/pool factorization, dense + sparse RHS solves, refactor/reset paths, and allocation expectations.
src/woodbury_kernel.jl	Refactors Woodbury computation into pure-data impl functions and adds dispatchers for VirtualPTDF vs pool-backed VirtualMODF.
src/ward_reduction.jl	Switches Ward reduction solves to `KLUWrapper` and uses sparse-RHS solve for boundary columns.
src/virtual_ptdf_calculations.jl	Updates VirtualPTDF to accept `KLULinSolveCache` and routes solves through the wrapper abstraction.
src/virtual_modf_calculations.jl	Makes VirtualMODF pool-backed for parallel `getindex`, adds locks around caches, and threads Woodbury calls through `with_worker`.
src/virtual_lodf_calculations.jl	Migrates VirtualLODF factorization/solves to `KLULinSolveCache` and uses in-place `solve!`.
src/ptdf_calculations.jl	Replaces dense RHS `KLU.solve!` usage with `solve_sparse!` over selected BA rows.
src/lodf_calculations.jl	Replaces dense RHS `KLU.solve!` usage with `solve_sparse!` for incidence-transpose RHS and updates denom solve.
src/PowerflowMatrixTypes.jl	Updates type aliases to use `KLULinSolveCache` and relaxes VirtualPTDF’s factorization type parameter.
src/PowerNetworkMatrices.jl	Includes/imports `KLUWrapper` APIs into the main module.
src/KLUWrapper/solve_sparse_rhs.jl	Implements sparse RHS packing + block-chunked solve to bound working set.
src/KLUWrapper/solve_dense.jl	Implements in-place dense solve and transpose solve; adds `\\` for allocating solve.
src/KLUWrapper/pool.jl	Adds `KLULinSolvePool` with `Channel`-based worker acquisition and refactor/reset logic.
src/KLUWrapper/klu_jll_bindings.jl	Adds low-level `ccall` bindings for SuiteSparse_long KLU entry points and error mapping.
src/KLUWrapper/klu_cache.jl	Implements `KLULinSolveCache` lifecycle, symbolic/numeric refactor, pattern checks, and scratch management.
src/KLUWrapper/KLUWrapper.jl	Defines the `KLUWrapper` module and exports wrapper APIs.
src/BA_ABA_matrices.jl	Updates ABA factorization storage and constructors to use `klu_factorize`.
Project.toml	Bumps version and replaces `KLU` dependency with `SuiteSparse_jll`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

luke-kiernan

One comment, will look more later.

luke-kiernan

Looks good to me

jd-lara · 2026-05-07T15:06:27Z

The final outcome of this PR is that we need a protection c717843 to avoid the issue in KLU reported here DrTimothyAldenDavis/SuiteSparse#1040

After much testing, we can't parallelize the solves even with multiple pool workers and this at least resolves the issue with having a PTDF and MODF

Copilot

Pull request overview

Copilot reviewed 24 out of 25 changed files in this pull request and generated 2 comments.

luke-kiernan

Looks good.

luke-kiernan

Two small nitpicks, but nothing major.

Will require companion PR in PF for compatibility.

josephmckinsey

Given how essentially every single call to KLU is globally serial, which is where all the run-time is, I'd be extremely surprised if a parallel call was ever faster. I'd definitely bet on it being thread-safe. I expect that we can relax some things eventually with enough testing.

Most of these comments are probably not important, but the testing one and _recover_factorization are probably important.

josephmckinsey · 2026-05-13T16:09:31Z

    Tuple{Dict{Tuple{Int, Int}, Int64}, Dict{Int64, Int64}},
-    <:LinearAlgebra.Factorization,
-}
+    K,


Any reason we can't make KLUinSolveCahce{Float64} a LinearAlgebra.Factorization like it is in KLU.jl?

Do we need it?

josephmckinsey · 2026-05-13T17:41:22Z

+        @info "Skipping: AppleAccelerate extension not loaded."
+        return
+    end
+    if Threads.nthreads() < 2


As of Julia 1.12, Julia starts with 1 interactive thread and 1 default thread. It is unlikely this check is meaningful. In particular, I believe the github actions do not exercise multi-threading at all despite having no info about it.

addressed in c19b994

luke-kiernan

Changes since my last review look fine.

edit: Joseph's comments make me realize that just looking at "changes since last review" has its issues.

jd-lara · 2026-05-13T19:40:03Z

Changes since my last review look fine

Looking for the last tests from the PSI branch that consumes this

claude and others added 8 commits April 25, 2026 17:54

make the changes usable

ab752d6

handle bad cases in the wrapper

5e685f5

update testing

af2af37

jd-lara requested review from Copilot and luke-kiernan April 28, 2026 00:15

Copilot started reviewing on behalf of jd-lara April 28, 2026 00:25 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Comment thread src/KLUWrapper/pool.jl Outdated

Comment thread src/KLUWrapper/pool.jl Outdated

Comment thread src/KLUWrapper/pool.jl Outdated

luke-kiernan reviewed Apr 28, 2026

View reviewed changes

Comment thread src/KLUWrapper/klu_cache.jl Outdated

jd-lara added 16 commits April 27, 2026 22:03

improve pool safety

14b6644

extend usage to the other matrices

53c4c2b

fix the pool bug

22b9d06

fix performance degratation

a1f1c77

fix testing

60527c9

address luke's comment

fe4eef5

add more testing

76d41f0

improve the testing

dd8aa33

fix docs

903393d

add windows protection with gc

1424602

add more measurements in KLU

ee1dcb2

add more measurements

786209b

make windows serial

9698d65

do some clean up in the use of the solver

daf1781

extend use of the pool to other matrices

c45c096

make diagnostics optional

10c3d9b

jd-lara added 2 commits May 4, 2026 23:40

add testing as requested in the PR comments

0b5ef78

add lock on windows

c717843

luke-kiernan approved these changes May 5, 2026

View reviewed changes

jd-lara added 4 commits May 5, 2026 17:15

add a retry survival mechanism

f567e27

more improvements

43a0940

undo the addition of pools and keep the safeguards on KLU

512870f

remove AA from deps

febc439

jd-lara requested a review from Copilot May 7, 2026 15:05

Copilot started reviewing on behalf of jd-lara May 7, 2026 15:05 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread src/KLUWrapper/KLUWrapper.jl Outdated

Comment thread src/KLUWrapper/klu_cache.jl Outdated

jd-lara added 2 commits May 7, 2026 11:35

simplify caches

11d66ad

use of isnothing clean ups

3325d75

luke-kiernan approved these changes May 7, 2026

View reviewed changes

Comment thread src/virtual_ptdf_modification.jl

jd-lara added 4 commits May 11, 2026 17:14

add mixed branch types

eac003a

add methods for mixed parallel types

a51ab82

add testing

4fa3a9b

add different methods for max calculation of the ratings

d5b95fb

jd-lara mentioned this pull request May 12, 2026

Enhance get_equivalent_rating function for BranchesParalell and add tests #297

Closed

jd-lara added 2 commits May 12, 2026 12:09

add missing methods for MixedBranchParallel

561b615

PR comments

9599145

luke-kiernan approved these changes May 13, 2026

View reviewed changes

Comment thread src/NetworkReductionData.jl Outdated

Comment thread src/BranchesParallel.jl Outdated

josephmckinsey requested changes May 13, 2026

View reviewed changes

jd-lara added 3 commits May 13, 2026 12:02

add new methods for IS changes

b46a349

bump PSY version

49300db

address PR comments

c19b994

jd-lara requested a review from josephmckinsey May 13, 2026 19:07

luke-kiernan approved these changes May 13, 2026

View reviewed changes

jd-lara merged commit 843ecad into main May 13, 2026
8 checks passed

Conversation

jd-lara commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Results

Precompile Time

Execution Time

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

luke-kiernan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luke-kiernan left a comment

Choose a reason for hiding this comment

Uh oh!

jd-lara commented May 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

luke-kiernan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luke-kiernan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

josephmckinsey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

josephmckinsey May 13, 2026

Choose a reason for hiding this comment

Uh oh!

jd-lara May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

josephmckinsey May 13, 2026

Choose a reason for hiding this comment

Uh oh!

jd-lara May 13, 2026

Choose a reason for hiding this comment

Uh oh!

luke-kiernan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jd-lara commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions Bot commented Apr 28, 2026 •

edited

Loading

luke-kiernan left a comment •

edited

Loading