Use DifferentiationInterface for AD in Implicit Solvers #2567

jClugstor · 2025-01-02T20:36:45Z

Checklist

Appropriate tests were added
Any code changes were done in a way that does not break public API
All documentation related to code changes were updated
The new code follows the
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Any new documentation only uses public API

Additional context

This is at a point where we can do stuff like this:

using OrdinaryDiffEqCore
using OrdinaryDiffEqSDIRK
using ADTypes
using EnzymeCore

function lorenz!(du, u, p, t)
    du[1] = 10.0 * (u[2] - u[1])
    du[2] = u[1] * (28.0 - u[3]) - u[2]
    du[3] = u[1] * u[2] - (8 / 3) * u[3]
end

u0 = [1.0; 0.0; 0.0]
tspan = (0.0, 100.0)
prob = ODEProblem{true, SciMLBase.NoSpecialize}(lorenz!, u0, tspan)
sol = solve(prob, ImplicitEuler(autodiff=AutoEnzyme(function_annotation = EnzymeCore.Const)))

and it actually uses sparsity detection and greedy jacobian coloring plus Enzyme to compute the Jacobians.

Some things I'm unsure about:

The current behavior is to use Jacobian coloring and SparseDiffTools by default. In order to keep that up, we have to wrap any ADType given in an AutoSparse unless it's already an AutoSparse. This does change the ADType that the user entered to be wrapped in an AutoSparse, which feels weird to me. Maybe there should be an option to just directly use the ADType entered, but by default we wrap it into an AutoSparse? I'm not sure.
The biggest issue is that the way the sparsity detectors work with DI is by using operator overloading (both TracerSparsityDetector and SymbolicsSparsityDetector do), but that's an issue when using AutoSpecialilzation, because of the FunctionWrappers. The solution I found was to just unwrap the function in the preparation process. I'm not sure what performance implication this will have, but I don't think it should do much, since the preparation should be run just once.
There's still pieces in here that use raw SparseDiffTools, (build_J_W) that I haven't looked in to how to convert to DI yet.
I may need to fix some of the versions.
There are some places that are getting sparse things where it's not expected.

jClugstor · 2025-01-02T23:07:43Z

In order for this to be completely done we'll need a DI equivalent for the SparseDiffTools JacVec operator that will be stored in the caches, for the W operators. I think I could just wrap the DI pushforward in an operator, but better to do a long term solution. In a recent Julia slack thread (https://julialang.slack.com/archives/C6G240ENA/p1735254065747829) there were a couple of solutions.

Is a good way to do this to make an extension in SciMLOperators for DifferentiationInterface that will have something like a DI_pushforward operator that basically wraps up the pushforward function in a FunctionOperator?

@ChrisRackauckas @oscardssmith @gdalle Any thoughts?

ChrisRackauckas · 2025-01-03T02:16:05Z

Is a good way to do this to make an extension in SciMLOperators for DifferentiationInterface that will have something like a DI_pushforward operator that basically wraps up the pushforward function in a FunctionOperator?

Yes

ChrisRackauckas · 2025-01-03T02:16:18Z

@avik-pal might already have one?

gdalle · 2025-01-03T10:16:09Z

Awesome work @jClugstor, thanks! Ping me when this is ready for a first round of DI-specific review.

This is at a point where we can do stuff like this [...] and it actually uses sparsity detection and greedy jacobian coloring plus Enzyme to compute the Jacobians.

Just to be clear, this wasn't possible before? So is this the first time that Enzyme can be used out-of-the-box to solve ODEs?

The current behavior is to use Jacobian coloring and SparseDiffTools by default. In order to keep that up, we have to wrap any ADType given in an AutoSparse unless it's already an AutoSparse. This does change the ADType that the user entered to be wrapped in an AutoSparse, which feels weird to me. Maybe there should be an option to just directly use the ADType entered, but by default we wrap it into an AutoSparse? I'm not sure.

Another option, which requires a bit more work (and is probably not worth it) would be to make SparseDiffTools compatible with the sparsity API of ADTypes v1. I think it might allow a more seamless upgrade. See e.g. JuliaDiff/SparseDiffTools.jl#298 for the detection aspect, and there should be a similar issue for the coloring aspect.

Speaking of SparseDiffTools, it still has an edge over DI when combined with FiniteDiff. The PR JuliaDiff/FiniteDiff.jl#191 could fix that, maybe @oscardssmith would be willing to take another look?

The biggest issue is that the way the sparsity detectors work with DI is by using operator overloading (both TracerSparsityDetector and SymbolicsSparsityDetector do), but that's an issue when using AutoSpecialization, because of the FunctionWrappers. The solution I found was to just unwrap the function in the preparation process. I'm not sure what performance implication this will have, but I don't think it should do much, since the preparation should be run just once.

Agreed, preparation is a one-time cost so I don't think we should worry too much (at least in the prototype stage).

There are some places that are getting sparse things where it's not expected.

What do you mean by unexpected sparse things? SparseMatrixCSC instead of Matrix? Can you give an example?

In order for this to be completely done we'll need a DI equivalent for the SparseDiffTools JacVec operator that will be stored in the caches, for the W operators. I think I could just wrap the DI pushforward in an operator, but better to do a long term solution. In a recent Julia slack thread (https://julialang.slack.com/archives/C6G240ENA/p1735254065747829) there were a couple of solutions.
Is a good way to do this to make an extension in SciMLOperators for DifferentiationInterface that will have something like a DI_pushforward operator that basically wraps up the pushforward function in a FunctionOperator?

We may also want to involve @oschulz and his AutoDiffOperators package to avoid duplication of efforts?

As a side note, DifferentiationInterface only has two dependencies: ADTypes and LinearAlgebra. For packages that use it extensively, I think it's reasonable to make it a full dep instead of a weakdep.

gdalle · 2025-01-03T13:55:55Z

lib/OrdinaryDiffEqDifferentiation/Project.toml

@@ -7,6 +7,8 @@ version = "1.3.0"
 ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
 ArrayInterface = "4fba245c-0d91-5ea0-9b3e-6abc04ee57a9"
 DiffEqBase = "2b5f629d-d688-5b77-993f-72d75c75574e"
+DifferentiationInterface = "a0c0ee7d-e4b9-4e03-894e-1c5f64a51d63"
+Enzyme = "7da242da-08ed-463a-9acd-ee780be4f1d9"


Does Enzyme need to become a dependency? This adds significant install overhead, but if AutoEnzyme is to be the new default AD then it makes sense

Yeah, probably doesn't need to be a dependency unless we're committing to having it be the default.

gdalle · 2025-01-03T13:56:30Z

lib/OrdinaryDiffEqDifferentiation/Project.toml

@@ -25,6 +29,7 @@ ADTypes = "1.11"
 ArrayInterface = "7"
 DiffEqBase = "6"
 DiffEqDevTools = "2.44.4"
+DifferentiationInterface = "0.6.23"


Suggested change

DifferentiationInterface = "0.6.23"

DifferentiationInterface = "0.6.28"

the other deps are also missing compat bounds?

Suggested change

DifferentiationInterface = "0.6.23"

DifferentiationInterface = "0.6.31"

gdalle · 2025-01-03T13:57:49Z

lib/OrdinaryDiffEqDifferentiation/src/alg_utils.jl

-            alg, autodiff = AutoForwardDiff(chunksize = cs))
+function prepare_ADType(alg::AutoFiniteDiff, prob, u0, p, standardtag)
+    # If the autodiff alg is AutoFiniteDiff, prob.f.f isa FunctionWrappersWrapper,
+    # and fdtype is complex, fdtype needs to change to something not complex


Note that DI does not explicitly support complex numbers yet. What I mean by that is that we forward things to the backend as much as possible, so if the backend does support complex numbers then it will probably work, but there are no tests or hard API guarantees on that. See JuliaDiff/DifferentiationInterface.jl#646 for the discussion

Also note that some differentiation operators are not defined unambiguously for complex numbers (e.g. the derivative for complex input)

Enzyme has an explicit variant of modes for complex numbers, that it probably would be wise to similarly wrap here (by default it will instead err warning about ambiguity if a function returns a complex number otherwise): https://enzyme.mit.edu/julia/stable/api/#EnzymeCore.ReverseHolomorphic . @gdalle I'm not sure DI supports this yet? so perhaps that means you may need to just call Enzyme.jacobian / autodiff directly in that case

@jClugstor can you maybe specify where we will encounter complex numbers by filling the following table?

derivative jacobian

complex inputs possible yes / no yes / no

complex outputs possible yes / no yes / no

When there are both complex inputs and complex outputs, that's where we run into trouble because we cannot represent derivatives as a single scalar. In that case, the differentiation operators are not clearly defined (the Jacobian matrix is basically twice as big as it should be) so we would need to figure out what convention the ODE solvers need (see https://discourse.julialang.org/t/taking-complex-autodiff-seriously-in-chainrules/39317).

@wsmoses I understand your concern, but I find it encouraging that DI actually allowed Enzyme to be used here for the first time (or at least so I've been told). This makes me think that the right approach is to handle complex numbers properly in DI instead of introducing a special case for Enzyme?

sure adding proper complex number support to DI would be great, but a three line change here to use in-spec Complex support when there's already overloads for other ADTypes feels reasonable?

e.g. something like

function jacobian(f, x::AbstractArray{<:Complex}, integrator::WhatevertheTypeIs{<:AutoEnzyme}) Enzyme.jacobian(ReverseHolomorphic, f, x) end

from the discussion in JuliaDiff/DifferentiationInterface.jl#646 I think DI complex support is a much thornier issue. In particular, various tools have different conventions (e.g. jax vs pytorch pick different conjugates of what is propagated). So either DI needs to make a choice and shim/force all tools to use it (definitely doable), and then user code must be converted to that convention (e.g. a separate shim on the user side). For example, suppose DI picked a different conjugate from forwarddiff.jl. DI could write its shim once in forward diff to convert which is reasonable. But suppose one was defining a custom rule within ForwardDiff and the code called DI somewhere, now that user code needs to conditionally do a different the shim to conjugate which feels kind of nasty to be put everywhere (in contrast to a self consistent assumption). I suppose the other alternative is for DI to not pick a convention, but that again prevents users from using since it's not possible to know whether they get the correct value for them -- and worse, they won't know when they need to do a conversion or not.

Thus, if complex support is desired, a three line patch where things are explicitly supported seems okay (at least until the DI story is figured out)

I agree that for now, this change seems to do the job (although it raises the question of consistency with the other backends that are handled via DI). But what will happen if the function in question is not holomorphic? That's the thorniest part of the problem, and that's why I wanted to inquire a bit more as to what kind of functions we can expect. Perhaps @jClugstor or @ChrisRackauckas can tell us more?

In any case, I have started a discussion on Discourse to figure out the right conventions: https://discourse.julialang.org/t/choosing-a-convention-for-complex-numbers-in-differentiationinterface/124433

Also note that the Enzyme-specific fix only handles dense Jacobians, not sparse Jacobians (which are one of the main reasons to use DI in the first place)

Sorry, I can't really tell you much about the complex number support, other than previously only ForwardDiff or FiniteDiff were used, so when someone used an implicit solver on a complex problem, their conventions were used I guess. Also just wanted to note that the code this comment is on is just making sure that the FiniteDiff fdtype isn't complex if the function is a function wrapper and doesn't have to do with complex numbers through the solver in general.

The latest release of DI inches closer to support for complex numbers. I read a little about conventions for non-holomorphic differentiation and it was a mess, so as a starting point DI assumes that the function is holomorphic. If you want e.g. a Jacobian, it is pretty much the only convention that makes sense anyway, otherwise you end up with a $2n \times 2n$ matrix. I have added complex holomorphic test scenarios, and right now FiniteDiff works on them. I'll test Enzyme soon enough.

avik-pal · 2025-01-03T14:39:18Z

@avik-pal might already have one?

Add a dispatch to https://github.com/SciML/NonlinearSolve.jl/blob/master/lib/SciMLJacobianOperators/src/SciMLJacobianOperators.jl#L115

jClugstor · 2025-01-03T20:13:20Z

@gdalle

Just to be clear, this wasn't possible before? So is this the first time that Enzyme can be used out-of-the-box to solve ODEs?

As far as I know this is the first time Enzyme has been used for the implicit solvers yes.

jClugstor · 2025-01-03T20:30:10Z

@avik-pal I noticed that the constructors for your JacobianOperator take a NonlinearProblem , but doesn't use it that much. Would it make sense to create a constructor JacobianOperator(f::AbstractSciMLFunction, ...) that does essentially the same thing and put it in SciMLOperators?

avik-pal · 2025-01-03T23:51:08Z

@avik-pal I noticed that the constructors for your JacobianOperator take a NonlinearProblem , but doesn't use it that much. Would it make sense to create a constructor JacobianOperator(f::AbstractSciMLFunction, ...) that does essentially the same thing and put it in SciMLOperators?

the prepare_jvp and prepare_vjp functions assume a 2/3 arg function for oop/iip respectively, that won't hold for ordinarydiffeq

gdalle · 2025-01-29T17:23:21Z

Within DI, the function prepare!_jacobian(f!, y, prep, backend, x) allows an existing preparation output prep to be resized and adapted to the new input x. By default, it re-runs the preparation from scratch (which, in the sparse case, includes sparsity detection). But we can override it at will if we have faster ways to resize. I have only done it for ForwardDiff yet, but FiniteDiff will soon follow (as soon as FiniteDiff.jl#191 is done). Here's how it looks:

https://github.com/JuliaDiff/DifferentiationInterface.jl/blob/5dfd7adec430c71a63f527a61962d5e5567e6702/DifferentiationInterface/ext/DifferentiationInterfaceForwardDiffExt/twoarg.jl#L372-L389

Can you take it out for a spin, and tell me whether anything else is missing from DI in your opinion?

jClugstor · 2025-01-30T20:30:49Z

I just finished up fixing the resizing, looks like it's working.

Up next, in-place DAE's aren't working with the tracing sparsity detection, due to the way the DAEResidualJacobianWrapper is set up.

gdalle · 2025-01-30T20:47:29Z

That's probably an issue for SparseConnectivityTracer to solve. But note that in the meantime, you can still use other methods for sparsity detection, like the good old Symbolics. Just switch the sparsity_detector argument in AutoSparse to Symbolics.SymbolicsSparsityDetector() and voilà!

jClugstor · 2025-01-30T21:14:49Z

Any detector based on operator overloading will fail I believe, it's just that the DAEResidualJacobianWrapper caches are built with Float64, and evaluating the function wrapped in them uses the caches.

gdalle · 2025-01-30T21:31:55Z

What did you use before?

If you know the pattern in advance, there is ADTypes.KnownJacobianSparsityDetector
If the problem is reasonably-sized, there's always DI.DenseSparsityDetector

jClugstor · 2025-01-30T22:00:54Z

Before I believe you had to supply a sparsity pattern to the ODEFunction manually, and then this function would find the colors or use the supplied colorvec, and pass it to the ForwardColorJacCache. So there wasn't any automatic sparsity detection.

gdalle · 2025-01-30T22:06:07Z

Okay then it's easy to at least keep the current behavior using the KnownJacobianSparsityDetector

jClugstor · 2025-01-30T22:09:59Z

Yes, I have it so if the user supplies a sparsity pattern KnownJacobianSparsityDetector is used, plus if a colorvec is supplied it uses that in a ConstantColoringAlgorithm.

gdalle · 2025-01-30T22:20:31Z

Seems like you got this covered

gdalle · 2025-01-31T09:28:19Z

As of the latest version of DI, FiniteDiff's sparse Jacobians use a new JVPCache (courtesy of @oscardssmith) and should therefore be non-allocating

jClugstor · 2025-02-03T13:58:24Z

I think I'm going to split this PR in to two parts, now that I understand what the previous behavior was better. Most of the existing problems stem from having the automatic sparsity detection apply automatically, where as actually previous behavior only used any kind of matrix coloring when a sparsity pattern was supplied to the ODEFunction.

So I think I'll split them up like this:

Make sure the previous behavior is preserved while using DI for all of the AD. This is basically done, every AD operation in there uses DI at this point, and we can look to see if a sparsity pattern and colorvec is supplied then use the DI mechanisms to take advantage of them. Also, because I already made a couple of changes to get the sparsity detectors to work through it all, if the user supplies an ADType AutoSparse with automatic detection, it should work in many cases already.
Allow for automatically applying sparsity and sparsity detection to the ADTypes, when asked for. This might require an addition to the API, maybe just a keyword argument? I'm thinking it would go something like this: if the user tells it to try automatic sparsity detection, and supplied a dense ADType, we try to build the Jacobian configs with automatic sparsity detection with TracerSparsityDetection, if that errors, we can try SymbolicSparsityDetection, and so on. This will need some rearranging of some stuff to make sure that the alg with the correct ADType gets supplied to the integrator.

…es matrix free jacobian evaluation

jClugstor · 2025-02-07T04:19:55Z

@gdalle we'll need support for the (undocumented) dir keyword argument for FiniteDiff.jl. I think we'll just need to add it to the FiniteDiff ADType, and then we should be able to just plug it in to finite_difference_derivative etc. in DI.

gdalle · 2025-02-07T07:40:16Z

This might be a good opportunity to document it in FiniteDiff as well? Along with other args or kwargs we want to use?

jClugstor added 21 commits January 2, 2025 15:18

import DI

f7f8dc2

switch calc_t_derivative

0e1c976

derivative wrappers

73eb575

change the derivatives in rosenbrock steps

29f05d6

try to fix tags

e8f89b0

add DI to deps

4a07327

update compat to DI patch

f0890b3

move tag wrangling to prepare_alg

e780017

move the tag

d7a7700

make sure calc_tderivative sees alg

de5660d

add fallback for scalar x

46207c5

get rid of println

8036a80

add prepare_ADType dispatches for FiniteDiff and ForwardDiff

1a544a5

add Enzyme, SparseConnectivityTracer, SparsematrixColorings

22d6c70

imports

2544042

sparse attempts at sparsity

408d28f

fix the tagging for ForwardDiff

ca10808

need ADTypes specifier

75c8333

unwrapped_f in preparation

9a48833

fix Rosenbrock time derivative

183aa2a

another calc_tderivative

dde2c93

gdalle reviewed Jan 3, 2025

View reviewed changes

jClugstor added 4 commits January 30, 2025 14:32

fixing stats tests

46e3310

no prep for nonmutating jacobian

95e2100

cleanup

4ceb889

set up resizing

7476c34

jClugstor added 11 commits February 3, 2025 09:01

use densesparsity for DAEs

24c9e8f

fix DAEResidualwrappers for other ADTypes

7bd079b

get rid of default automatic sparsity detection

b829bea

update the jac_prototype if it exists

c2cf473

test check

def6420

make sure to convert t when calculating tgrad with AutoForwardDiff

a9ffa3f

add update_coefficients! function for StatefulJacobian operators, fix…

32ebf6c

…es matrix free jacobian evaluation

add the FirstAutodiff errors back

ff5ef64

no conversion in oop calc_tderivative

1068d0c

scalar "jacobian" should also throw FirstAutodiffJacError if applicable

7744028

use DI.derivative instead

1a8b3fe

This was referenced Feb 6, 2025

Interpolating Adjoint possible test fix SciML/SciMLSensitivity.jl#1162

Closed

Add dir field to AutoFiniteDiff SciML/ADTypes.jl#106

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use DifferentiationInterface for AD in Implicit Solvers #2567

Use DifferentiationInterface for AD in Implicit Solvers #2567

jClugstor commented Jan 2, 2025 •

edited

Loading

jClugstor commented Jan 2, 2025

ChrisRackauckas commented Jan 3, 2025

ChrisRackauckas commented Jan 3, 2025

gdalle commented Jan 3, 2025

gdalle Jan 3, 2025

jClugstor Jan 3, 2025

gdalle Jan 3, 2025

gdalle Jan 21, 2025

gdalle Jan 3, 2025

gdalle Jan 3, 2025

wsmoses Jan 4, 2025

gdalle Jan 4, 2025 •

edited

Loading

wsmoses Jan 4, 2025

gdalle Jan 4, 2025

gdalle Jan 4, 2025

jClugstor Jan 7, 2025

gdalle Jan 21, 2025

avik-pal commented Jan 3, 2025

jClugstor commented Jan 3, 2025

jClugstor commented Jan 3, 2025

avik-pal commented Jan 3, 2025

gdalle commented Jan 29, 2025

jClugstor commented Jan 30, 2025

gdalle commented Jan 30, 2025

jClugstor commented Jan 30, 2025

gdalle commented Jan 30, 2025

jClugstor commented Jan 30, 2025

gdalle commented Jan 30, 2025

jClugstor commented Jan 30, 2025

gdalle commented Jan 30, 2025

gdalle commented Jan 31, 2025

jClugstor commented Feb 3, 2025

jClugstor commented Feb 7, 2025

gdalle commented Feb 7, 2025

	DifferentiationInterface = "0.6.23"
	DifferentiationInterface = "0.6.28"

	DifferentiationInterface = "0.6.23"
	DifferentiationInterface = "0.6.31"

	derivative	jacobian
complex inputs possible	yes / no	yes / no
complex outputs possible	yes / no	yes / no

Use DifferentiationInterface for AD in Implicit Solvers #2567

Are you sure you want to change the base?

Use DifferentiationInterface for AD in Implicit Solvers #2567

Conversation

jClugstor commented Jan 2, 2025 • edited Loading

Checklist

Additional context

jClugstor commented Jan 2, 2025

ChrisRackauckas commented Jan 3, 2025

ChrisRackauckas commented Jan 3, 2025

gdalle commented Jan 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gdalle Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avik-pal commented Jan 3, 2025

jClugstor commented Jan 3, 2025

jClugstor commented Jan 3, 2025

avik-pal commented Jan 3, 2025

gdalle commented Jan 29, 2025

jClugstor commented Jan 30, 2025

gdalle commented Jan 30, 2025

jClugstor commented Jan 30, 2025

gdalle commented Jan 30, 2025

jClugstor commented Jan 30, 2025

gdalle commented Jan 30, 2025

jClugstor commented Jan 30, 2025

gdalle commented Jan 30, 2025

gdalle commented Jan 31, 2025

jClugstor commented Feb 3, 2025

jClugstor commented Feb 7, 2025

gdalle commented Feb 7, 2025

jClugstor commented Jan 2, 2025 •

edited

Loading

gdalle Jan 4, 2025 •

edited

Loading