add lazy compute operations #25

rafaqz · 2024-12-02T12:47:52Z

This PR defines the bones of lazy computation objects and a Problem wrapper, following SciML patterns ~~(although we could also swap compute here to solve in retrospect)~~ where a Problem is passed to solve

The idea is that instead of calling ConScape.jl functions directly, we define Operation objects with their parameters, and chain multiple of these into a combined computation Problem. Then this object can be computed efficiently with e.g. one generation of GridRSP or just its columns, or even over multiple windowed grids (without explicitly coding the windowing loop and calling these functions).

On AbstractProblem we can call compute(p::Problem, g::Grid) to do the actual computation.

(There is also an idea to do assess(p::Problem, ::Grid) to get information about the computation for a given Grid, like total memory use, flops (if we need that and can calculate it) and memory requirements so we can preallocate it or book it on clusters - but this can be fleshed out later on)

I've added a WindowedProblem and TiledProblem to organise running operations over tiles and windows for very large data (this is still WIP).

@vboussange this might be ready for a look if you want to give some early feedback? Especially in regards to anything I have missed for autodiff capacity and iterative solvers. Its built on LinearSolve.jl now, but still actually materializes a Z array becayse it needs some work on the methods to allow them all to work column by column.

codecov-commenter · 2024-12-13T14:14:05Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 89 lines in your changes missing coverage. Please review.

Please upload report for BASE (dev@4b878ad). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/problem.jl	0.00%	36 Missing ⚠️
src/tiles.jl	0.00%	33 Missing ⚠️
src/gridrsp.jl	0.00%	18 Missing ⚠️
src/grid.jl	0.00%	2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@          Coverage Diff           @@
##             dev      #25   +/-   ##
======================================
  Coverage       ?   82.32%           
======================================
  Files          ?        8           
  Lines          ?      764           
  Branches       ?        0           
======================================
  Hits           ?      629           
  Misses         ?      135           
  Partials       ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vboussange

Hey there, I think this is going in the right direction. My major comment relates to more abstraction from the RSP framework, from which future developments may follow more naturally. I have other things in mind, that I will post in separate issues.

vboussange · 2024-12-17T07:07:20Z

src/grid.jl

I would argue that memory use and flops is also related to the prob::Problem. It could be easy to assess the expected number of flops in VectorSolve() where we can do the calculation for one node and multiply by the number of targets, but it may be more difficult for a MatrixSolve() without performing the operation.

Yes makes sense, memory can just be checked by running one column.

Flops may need us to get some values I'm not sure we have access to (but e.g. UMFPACK dose track flops internally).

vboussange · 2024-12-17T07:47:59Z

src/gridrsp.jl

@@ -305,6 +370,18 @@ function least_cost_kl_divergence(grsp::GridRSP, target::Tuple{Int,Int})
    return reshape(div, grsp.g.nrows, grsp.g.ncols)
 end

+
+@kwdef struct ConnectedHabitat{CV,DT,DV} <: RSPOperation


I think that this could be better abstracted - the "equivalent connected habitat" metrics extends beyond the RSP framework. The calculation of connected habitat is simply qˢ .* (S*qᵗ), where S is a proximity matrix. A proximity matrix is derived from a distance matrix, by applying a function distance_transformation that usually involves the dispersal ability of the species considered.

It would be nice to have a structure like AbstractDistance, and from that we could define a RSPDistance (which boils down to the expected_cost) but also a LCPDistance, a ResistanceDistance and what not. Those could be an input to ConnectedHabitat, and would be blindly used to calculate the distance matrix.

More generally, a distance object should be provided to the prob::Problem, and be used to inform the specific operation requested. If no operation is requested, then compute could output an object of the sort ExplicitGrid (I feel like this is a bad name, we should reflect on this), which is essentially a graph with edge weights corresponding to the distance evaluation. This object would somewhat be a sort of RSPGrid, but more interpretable and generic.

This reflection brings me to a second point: the GridRSP object is too specific. The primary goal of this is to checkpoint some calculations, but this should be the role of the Problem structure. RSPGrid is only a buffer with no interest from the user. This buffer could be hidden in prob::Problem. Notice that this is very similar to the solution used in SciML and LinearSolve.jl, where init(prob) allows to do some caching for e.g. the LU factors when doing LU solve (https://docs.sciml.ai/SciMLBase/dev/interfaces/Init_Solve/ and https://docs.sciml.ai/LinearSolve/stable/tutorials/caching_interface/)

Yes GridRSP is pretty much hidden now in this PR. You call compute on a Problem and Grid and GridRSP is internal.

I'm working towards preallocating the whole problem up front like LinearSolve does but most of the code is unfortunately not written in that style. The first step will be to separate out non-allocating ! methods for everything so the workspace arrays can be passed in.

Also good point about using more appropriate abstractions.

Had a good discussion with @ninbrm about refactoring distances and fixing the abstract type structure to better represent the range of problems. Theta really doesn't need to be a parameter but the fact that distance functions are curreently used like functions rather than constructed objects currently makes it awkward to put parameters in them. So we should change them to be types like you describe.

I'll implement this new abstract type hierarchy in the next iteration of this PR and we can discuss from there

Great, looking forward to that!

Ok the new abstractions are kind of implemented now :)

It works in the landmarks.jmd example at least, but needs some more polish and thought. Eventually we can refactor some of the complexity away but for now Im trying to build it on top of the current methods without changing them too much, so the old tests still run and we can use them to check the new methods.

src/problem.jl

vboussange · 2024-12-17T08:07:32Z

src/problem.jl

+    @show size(A) size(B) size(b)
+
+    # Define and initialise the linear problem
+    linprob = LinearProblem(A, b)


See my comment above, this should appear in the construction of the Problem. We should additionally allow the user to provide a solver, with possibly some warnings or advices on suitable solvers

For sure, this is just hacked in for now to get it working while I learn LinearSolve.

Something that should also be addressed, and that I do not fully understand: in many of the methods in randomizedshortestpath.jl, there is a second linear solve with the use of landmarks

ConScape.jl/src/randomizedshortestpath.jl

Line 194 in c18ca9b

C̄ = (I - W)\((C .* W)*Z)

This may have consequences on how we want to construct a Problem.

Yeah I saw that too. These methods will need to be handled but I don't totally understand them yet either.

For the LinearSolve part there is now a LinearSolver object that takes the args and keywords for solve and init so you can use any solver from LinearSolve.jl in conscape.

One issue will be maybe specifying if a solver should be used for all RSP solves as well as the Z matrix, or supplying a different solver for different components as they don't necessarily have the same properties.

src/tiles.jl

vboussange

Looks good! Did you reflect on how to treat the \ in randomizedshortestpaths.jl? This may be an important thing to consider, to make sure that the proposed architecture and solver modes are also used there.

vboussange · 2024-12-18T16:32:19Z

src/connectivity_measure.jl

Why not having something like

(d::LeastCostDistance)(args...; kwargs...) = least_cost_distance(args...; kwargs...)

I've allowed passing solver modes to \ in rsp, but playing around I found some situations where LinearSolve solvers were very fast on Z but stalled in RSP, so maybe we want different solvers per problem?

We also want to factor out some of the solves as the same thing is being done in multiple places if you want multiple kinds of betweenness metrics in the same run.

vboussange · 2024-12-18T16:37:42Z

src/graph_measure.jl

+    diagvalue::DV=nothing
+end
+
+@kwdef struct ConnectedHabitat{DV} <: GraphMeasure


I think that this should be a topological measure

vboussange · 2024-12-18T16:43:01Z

src/connectivity_measure.jl

@@ -0,0 +1,38 @@
+# New type-based interface
+# Easier to add parameters to these
+abstract type ConnectivityMeasure end


Can you describe what is the difference between those? I feel like the type system should be as close as possible to the mathematical classification of these measures.

I suggest that we could follow the classification of e.g. Graphs.jl or that of networkx: https://networkx.org/documentation/stable/reference/algorithms/index.html

i.e., DistanceMeasure, ConnectivityMeasure, CentralityMeasure, etc...

sounds like a good idea, the GraphMeasures that we discussed would then be CentralityMeasures (connected_habitat is a weighted closeness centrality and the betweennesses are betweenness centralities)

Lets do a round or two of documenting and reorganising/renaming the type structure when differences get clearer, the objects here can be seen as place holders for better names.

src/grid.jl

vboussange · 2024-12-18T16:47:30Z

src/grid.jl

Since you are refactoring this, it would make sense to already think of #29. Maybe add a field connected_components that is a vector of vectors, where each vector corresponds to a list of node of the associated connected component.

src/problem.jl

vboussange · 2024-12-18T17:05:28Z

src/problem.jl

-function compute(m::VectorSolve, p::AbstractProblem, g::Grid)
-    # Compute Z column by column
-    _, targetnodes = _targetidx_and_nodes(g)
+function solve(m::VectorSolve, cm::FundamentalMeasure, p::AbstractProblem, g::Grid)


Excellent, this looks good! But I think that this should be a DirectVectorMode, and we likely need to implement something similar for IndirectVectorMode, where instead of the init(linprob), we calculate a preconditioner, and then dispatch the calculations over the threads. This mode will make the solve free from any matrix materialization. It would be awesome to also have in the future an IndirectMatrixMode, this would work great on GPUs.

Actually, why not call those XxVectorMode and XxBatchMode?

See here for the use of the preconditioner: https://github.com/Circuitscape/Circuitscape.jl/blob/fff12fe43e5af5be00f4056a87460ad07966e432/src/core.jl#L155-L160

Ive refactored this so that VectorSolver is just using UMFPACK directly column by column, and LinearSolver lets you use anything from LinearSolve.jl by passing the args and keywords you would pass to init and solve

The preconditioner part will be done automatically by LinearSolve.jl in init if you pass a precs argument in your solver definition, as far as I can tell.

But we really need that batch mode added to LinearSolve.jl to push this much further, currently I'm running init and then deepcopy on it for each thread. But there are surely more efficient ways were only the mutated parts are copied rather than the whole thing.

vboussange · 2024-12-18T17:09:24Z

src/connectivity_measure.jl

How about having a folder connectivity_measures and then files that contain specialised methods belonging to a certain framework, such as e.g. the RSP framework but also possibly to other frameworks?

I think we should reorganise all the code like that, but in this PR im trying to add new things without completely reorganising the old ones so that changes to the old code diff properly

So for now its just a file with struct definitions and the methods are left in their current place.

rafaqz added 9 commits December 2, 2024 13:41

add lazy compute operations

a0a17ea

compute on any object

cda2fd9

add allocate method

2faf11a

add windowed ops

5043bc4

refactor

a783626

more ops

9fbbb8e

add LinearSolve

1ca659f

cleanup

294fabc

Merge remote-tracking branch 'upstream/dev' into lazy_compute_ops

1f45f1a

rafaqz added 2 commits December 13, 2024 16:35

Working Problems

cc9dbe0

load tiles.jl

d8232b8

rafaqz changed the base branch from main to dev December 13, 2024 15:36

vboussange marked this pull request as ready for review December 17, 2024 06:55

vboussange reviewed Dec 17, 2024

View reviewed changes

implement better abstract heirarchies

05d965b

vboussange reviewed Dec 18, 2024

View reviewed changes

rafaqz added 9 commits December 27, 2024 22:06

cleaner problems with RasterStack

9117ff6

working tiling

850c170

working LinearSolver with tests

c1d52af

update and comment tests

9d44267

tweaks

89bca5d

refactor

ecbfe08

bugfixes

14d2e00

bugfix distance_transformation getproperty

5bb180d

update docs and comments

aaf300a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add lazy compute operations #25

add lazy compute operations #25

rafaqz commented Dec 2, 2024 •

edited

Loading

codecov-commenter commented Dec 13, 2024 •

edited

Loading

vboussange left a comment

vboussange Dec 17, 2024

rafaqz Dec 17, 2024

vboussange Dec 17, 2024

rafaqz Dec 17, 2024 •

edited

Loading

rafaqz Dec 17, 2024

rafaqz Dec 17, 2024 •

edited

Loading

vboussange Dec 17, 2024

rafaqz Dec 17, 2024 •

edited

Loading

vboussange Dec 17, 2024

rafaqz Dec 17, 2024

vboussange Dec 17, 2024

rafaqz Dec 17, 2024

rafaqz Jan 4, 2025 •

edited

Loading

vboussange left a comment

vboussange Dec 18, 2024

rafaqz Jan 4, 2025 •

edited

Loading

vboussange Dec 18, 2024

vboussange Dec 18, 2024 •

edited

Loading

ninbrm Dec 18, 2024

rafaqz Jan 4, 2025

vboussange Dec 18, 2024

vboussange Dec 18, 2024

vboussange Dec 18, 2024

vboussange Dec 18, 2024

rafaqz Jan 4, 2025

rafaqz Jan 4, 2025 •

edited

Loading

rafaqz Jan 4, 2025

vboussange Dec 18, 2024

rafaqz Jan 4, 2025

rafaqz Jan 4, 2025

add lazy compute operations #25

Are you sure you want to change the base?

add lazy compute operations #25

Conversation

rafaqz commented Dec 2, 2024 • edited Loading

codecov-commenter commented Dec 13, 2024 • edited Loading

Codecov Report

vboussange left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaqz Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaqz Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaqz Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaqz Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

vboussange left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaqz Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vboussange Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaqz Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaqz commented Dec 2, 2024 •

edited

Loading

codecov-commenter commented Dec 13, 2024 •

edited

Loading

rafaqz Dec 17, 2024 •

edited

Loading

rafaqz Dec 17, 2024 •

edited

Loading

rafaqz Dec 17, 2024 •

edited

Loading

rafaqz Jan 4, 2025 •

edited

Loading

rafaqz Jan 4, 2025 •

edited

Loading

vboussange Dec 18, 2024 •

edited

Loading

rafaqz Jan 4, 2025 •

edited

Loading