Sub-task A of the analysis_shared_state epic (tracker: PyAutoPrompt/z_features/analysis_shared_state.md).
This issue covers the autofit deliverable — the generic mechanism, the 1D Gaussian toy, and its fast tests.
The lensing datacube consumer is sub-task B, tracked separately and issued later
(PyAutoPrompt/autolens/datacube_shared_state_consumer.md).
Overview
FactorGraphModel fits N per-factor Analysis objects that share a model, summing
their likelihoods. When the factors share model parameters, a large fraction of each
factor's likelihood is identical work recomputed N times. This task gives PyAutoFit a
domain-agnostic mechanism for a factor to compute a per-evaluation shared object once
and have every factor reuse it.
The work is built toy-first: a 1D Gaussian shared-state example (the natural extension
of af.ex + scripts/features/graphical_models.py) is the development vehicle and proof,
paired with fast, deterministic tests in autofit_workspace_test. Proving the mechanism on
the toy here unblocks the motivating lensing consumer (ALMA datacube; sub-task B), where
~97% of the per-channel inversion-setup work is channel-invariant and currently rebuilt N
times. PyAutoFit ships only the generic, domain-agnostic protocol — no lensing logic.
Plan (autofit deliverable)
Phase 1 — PyAutoFit mechanism + toy af.ex pieces (library, Opus)
- Add opt-in
Analysis.shared_state_from(instance) -> None (default None, sibling of modify_before_fit)
and a defaulted shared= kwarg on log_likelihood_function.
FactorGraphModel computes the shared object once from the lead factor before the loop and forwards
it to each factor only when non-None → existing graphs byte-for-byte unchanged.
- Add a shared-aware example
Analysis to autofit/example/analysis.py so the toy is reusable as
af.ex by both the workspace tutorial and the fast tests.
- Unit tests in
test_autofit/graphical/.
Phase 2 — autofit_workspace tutorial + docs (workspace, Opus prose)
- New
scripts/features/shared_analysis_state.py extending graphical_models.py, framing the shared
Gaussian model_data as the toy analog of the lensing mapper/L/F. Notebook regen, features/README.md
entry, docs .rst registration.
Phase 3 — autofit_workspace_test fast-assert tests (test workspace, Sonnet)
scripts/graphical/shared_state.py: counter proves shared_state_from runs once per eval (not N×),
exact shared-vs-unshared likelihood equality, no-provider graph unchanged, tiny end-to-end
DynestyStatic(nlive=50, maxcall=1000, maxiter=1000, number_of_cores=1).
- JAX variant under
scripts/jax_assertions/: shared object threads through jax.jit as a pytree;
jit/vmap likelihood matches numpy.
- Auto-discovered by
run_all_scripts.sh (run under PYAUTO_TEST_MODE=1); added to smoke_tests.txt.
Follow-on — sub-task B (lensing), not in this issue: PyAutoLens datacube shared_state_from
(ray-trace + mapper + L + F) + shared-aware AnalysisInterferometer + per-channel fallback;
autolens_workspace datacube scripts; autolens_workspace_test datacube assert; autolens_profiling
re-measure (~17× on the inversion-setup block for a 34-channel cube). Issued later via
/start_dev autolens/datacube_shared_state_consumer.md.
Detailed implementation plan
Affected Repositories (this issue)
- rhayes777/PyAutoFit (primary — mechanism +
af.ex toy + unit tests)
- Jammy2211/autofit_workspace (tutorial + docs)
- Jammy2211/autofit_workspace_test (fast-assert tests)
Work Classification
Both (library first: PyAutoFit → autofit_workspace + autofit_workspace_test)
Branch Survey
| Repository |
Current Branch |
Dirty? |
| ./PyAutoFit |
main |
clean |
| ./autofit_workspace |
(survey at /start_library) |
— |
| ./autofit_workspace_test |
(survey at /start_library) |
— |
Suggested branch: feature/analysis-shared-state
Worktree root: ~/Code/PyAutoLabs-wt/analysis-shared-state/ (created later by /start_library)
Design decisions (locked)
- Q1 who computes it: opt-in
Analysis.shared_state_from(instance) -> None (default None).
FactorGraphModel calls it on the lead factor (first factor returning non-None; all None → no sharing).
- Q2 how factors receive it: defaulted kwarg
log_likelihood_function(self, instance, shared=None)
(for af.ex, (self, instance, shared=None, xp=np)). Forwarded only when non-None → existing graphs unchanged.
- Q3 JAX: shared object is a normal pytree of traced arrays, recomputed inside the jitted region each
eval, no Python-side memoisation on the instance (avoids closure cache-busting).
- Q4 correctness: PyAutoFit trusts the provider + documents the contract; the lensing consumer (sub-task B)
owns the channel-invariance precondition and falls back to per-channel compute when it fails.
- Toy code home: the shared-aware example
Analysis lives in PyAutoFit autofit/example/analysis.py
(af.ex), reused by both the workspace tutorial and the test script — one implementation, no duplication.
The 1D Gaussian toy (development vehicle)
- N
Analysis objects each fit a 1D dataset, sharing one Gaussian component (shared priors), each
adding its own per-dataset profile.
shared_state_from(instance) computes the shared component's model_data array once; each factor's
log_likelihood_function(instance, shared=...) reuses it instead of rebuilding it.
- Pedagogical parallel (stated in the tutorial): the shared Gaussian
model_data ↔ the lensing
mapper + L + curvature F — the expensive block currently rebuilt N times.
Implementation Steps (Phase 1 — PyAutoFit)
autofit/non_linear/analysis/analysis.py: add shared_state_from(self, instance) returning None
(docstring as the per-evaluation, cross-factor sibling of modify_before_fit); base
log_likelihood_function signature → (self, instance, shared=None).
autofit/graphical/declarative/collection.py: in FactorGraphModel.log_likelihood_function, compute
shared from the lead factor before the loop; forward shared= only when non-None; else call as today.
autofit/graphical/declarative/factor/analysis.py: AnalysisFactor.log_likelihood_function(self, instance, shared=None) forwards shared to the wrapped analysis (only when non-None); EPAnalysisFactor unaffected.
autofit/graphical/declarative/factor/{prior,hierarchical}.py: add defaulted shared=None pass-through.
autofit/example/analysis.py: shared-aware example Analysis — shared_state_from computes the shared
component's model_data once; log_likelihood_function(self, instance, shared=None, xp=np) reuses it, with
the shared is None fallback rebuilding it (single-analysis path unchanged).
test_autofit/graphical/: 3-factor mock graph; counter proves shared_state_from runs once per eval (not
N×); summed likelihood correct and exactly equal to the per-factor path; no-provider graph unchanged.
Key Files
autofit/graphical/declarative/collection.py — the per-factor sum loop
autofit/graphical/declarative/factor/analysis.py — AnalysisFactor / EPAnalysisFactor precedent
autofit/non_linear/analysis/analysis.py — Analysis base hook + modify_before_fit sibling
autofit/graphical/declarative/factor/{prior,hierarchical}.py — kwarg pass-through
autofit/example/analysis.py — shared-aware af.ex toy Analysis
test_autofit/graphical/ — new unit tests
- autofit_workspace
scripts/features/shared_analysis_state.py + features/README.md + docs .rst
- autofit_workspace_test
scripts/graphical/shared_state.py, scripts/jax_assertions/…, smoke_tests.txt
Test conventions to follow (autofit_workspace_test)
- "Fast" = real samplers with hard caps (
nlive=50, maxcall=1000, maxiter=1000, number_of_cores=1), NOT mocks.
- Scripts auto-discovered by
run_all_scripts.sh under PYAUTO_TEST_MODE=1; optionally listed in smoke_tests.txt.
- Deterministic assertions (counter
==, exact likelihood equality) over stochastic-posterior tolerances.
Original Prompt
Click to expand starting prompt
Cross-Analysis shared-state mechanism for FactorGraphModel
A large new PyAutoFit feature: let the per-factor Analysis objects in a
FactorGraphModel share per-evaluation, model-dependent precomputed state
across each other, so that work which is identical for every factor at a given
point in parameter space is computed once and reused by all factors —
instead of every factor recomputing it independently.
Primary repo: @PyAutoFit (the mechanism). Consumer/proof: @PyAutoLens +
@autolens_workspace (the ALMA datacube likelihood that motivates it).
Hard constraint (read first)
PyAutoFit must not depend on PyAutoArray / PyAutoGalaxy / PyAutoLens (see
PyAutoFit/CLAUDE.md — "PyAutoFit does NOT depend on..."). Therefore:
- The mechanism PyAutoFit ships must be completely domain-agnostic: it knows
nothing about lensing, inversions, mappers, or visibilities. It only knows
"factors may want to compute a shared object once per evaluation and have all
factors see it."
- All lensing-specific logic (what to share, how to build the mapper once, how
each channel consumes it) lives in PyAutoLens (the AnalysisInterferometer
side) and is wired up in autolens_workspace datacube scripts.
So the deliverable is a generic shared-state protocol in PyAutoFit plus a
lensing consumer in PyAutoLens that proves it on the datacube.
Existing hooks to build on (do not reinvent)
PyAutoFit already has two precedents for injecting state into an Analysis
around a fit — study both before designing:
-
EPAnalysisFactor (autofit/graphical/declarative/factor/analysis.py:257+)
attaches a per-iteration _cavity_mean_field onto its wrapped Analysis
immediately before optimisation, so the user's log_likelihood_function can
read shared cross-factor messages. This is exactly the shape of mechanism we
want — state computed at the graph level and attached to each factor's
Analysis — except EP attaches it once per EP outer-iteration, whereas the
datacube needs it recomputed once per likelihood evaluation (the lens
parameters change every sample).
-
Analysis.modify_before_fit (autofit/non_linear/analysis/analysis.py:320)
is the existing per-Analysis pre-fit hook. It is per-analysis and runs once
before sampling, so it cannot host per-evaluation shared state, but its
docstring ("alter the Analysis in ways that can speed up the fitting") is
the precedent for "precompute-then-reuse" and the new hook should read as its
per-evaluation, cross-factor sibling.
Design (to refine in the issue, present options)
The core need: at each call to FactorGraphModel.log_likelihood_function(instance),
before the per-factor loop, optionally compute a shared object from the
instance, then make it available to every factor's log_likelihood_function.
Sketch of the target loop in collection.py:
def log_likelihood_function(self, instance):
shared = self.compute_shared(instance) # None unless a shared-state provider is set
log_likelihood = 0
for model_factor, instance_ in zip(self.model_factors, instance):
log_likelihood += model_factor.log_likelihood_function(instance_, shared=shared)
return shared_aware_sum(...)
Design questions the issue must resolve (present 2-3 concrete options, pick one):
-
Who computes the shared object? Options:
- A
shared_state_provider callable/object set on the FactorGraphModel
(domain-agnostic: it takes the instance, returns an opaque object).
- A designated "lead" factor whose Analysis exposes a
compute_shared(instance) method; remaining factors receive its output.
- A new optional
Analysis.shared_state_from(instance) protocol method
(default returns None) so any Analysis can opt in.
Favour whichever keeps PyAutoFit domain-blind and makes the lensing side a
thin consumer.
-
How does a factor receive it? Options:
- New optional kwarg
log_likelihood_function(self, instance, shared=None)
with a default so every existing Analysis keeps working unchanged
(back-compat is mandatory — hundreds of Analyses exist).
- Attribute injection like
EPAnalysisFactor (analysis._shared_state = ...)
set/cleared around the loop.
The kwarg is cleaner and JIT-friendlier; the attribute path matches the EP
precedent. Decide explicitly and justify.
-
JAX / pytree correctness. The datacube path is JIT-compiled
(use_jax=True, register_model pytrees). The shared object will contain
traced arrays (mapper triplets, mapping matrix, curvature). It must:
- be threadable through
jax.jit as a normal pytree (no Python-side caching
that cache-busts — see feedback_jax_closure_cache_busts);
- be recomputed inside the jitted region each eval (it depends on the traced
lens parameters), not memoised across evals on the instance;
- not break the single-factor / non-cube path (shared is
None → identical
behaviour and identical numbers).
-
Correctness + ordering. The shared object is only valid when the relevant
parameters really are shared across factors. The mechanism must not silently
produce wrong likelihoods if a user wires up factors whose "shared" inputs
actually differ. Decide whether to (a) trust the provider, (b) assert
structural equality of the relevant sub-instance across factors, or (c)
document the contract and leave it to the consumer. Note the physical caveat
from alma_datacube.md: sharing is only valid when uv_wavelengths and
noise_map are ~channel-invariant (narrow-emission-line regime); outside it,
the consumer must fall back to per-factor compute.
Why this is needed (the motivating problem)
The ALMA datacube likelihood (autolens_workspace#120 and its roadmap, all
shipped: see complete.md "datacube roadmap") fits an N-channel spectral cube
as N independent AnalysisInterferometer objects sharing one lens model,
wired together with af.FactorGraphModel. The FactorGraph routes the shared
lens parameters to every per-channel AnalysisInterferometer.log_likelihood_function
and sums the results:
# autofit/graphical/declarative/collection.py:89-107
def log_likelihood_function(self, instance):
log_likelihood = 0
for model_factor, instance_ in zip(self.model_factors, instance):
log_likelihood += model_factor.log_likelihood_function(instance_)
return log_likelihood
AnalysisFactor just forwards to the wrapped analysis:
# autofit/graphical/declarative/factor/analysis.py:253-254
def log_likelihood_function(self, instance):
return self.analysis.log_likelihood_function(instance)
The problem: because the lens model is shared across all channels, a large
fraction of each channel's likelihood is identical work. Profiling
(autolens_profiling/likelihood_breakdown/datacube/delaunay.py, results in
autolens_profiling/likelihood_runtime/OPTIMIZATION_NOTES.md) shows that for a
34-channel cube the step-by-step CPU cost is ~170-205 s/eval, of which:
- ~78% is the per-channel "inversion setup" — ray-tracing the shared lens
model, then building the source-plane mapper (Delaunay triangulation,
neighbours, pixel weights) and the mapping matrix L;
- ~17-19% is the curvature matrix
F = Lᵀ W̃ L;
- only ~5% (data vector
D, NNLS reconstruction, log-evidence) is genuinely
per-channel (it depends on each channel's distinct visibilities).
In the sparse / w̃ inversion route that production actually uses (this is the
important subtlety — see PyAutoPrompt/issued/alma_datacube.md and the
investigation note in complete.md about the transformer-free per-likelihood
path), the expensive NUFFT is precomputed once at dataset load, so the
shareable per-eval work is the traced grids + Delaunay mapper + mapping matrix
L + curvature F — all pure functions of the shared lens model + shared source
mesh, currently rebuilt N times. The data vector and reconstruction are the only
irreducibly per-channel parts.
This is "Aris's deferred shared-Lᵀ W̃ L optimisation" (autolens_workspace#120).
A decomposition of the dominant inversion-setup step
(autolens_profiling/likelihood_breakdown/datacube/inversion_setup_decompose.py,
SMA / CPU, sparse route) confirms Lᵀ W̃ L is exactly the right thing to
share and sizes the win:
| inversion-setup sub-step |
per-call |
shareable? |
| ray-trace |
~0.001 s |
✅ invariant |
| Delaunay mapper + mapping matrix L |
~0.19 s |
✅ invariant |
curvature F = Lᵀ W̃ L |
~1.57 s |
✅ invariant |
data vector D = Lᵀ·dirty_image |
~0.06 s |
❌ per-channel |
So ~97% of the per-channel inversion work is channel-invariant, and the
curvature F alone is ~86% of it — not the mapper as an earlier hypothesis
assumed. Sharing the invariant block collapses the per-channel inversion total
from N × ~1.81 s to ~1.81 s + (N-1) × ~0.06 s — roughly a 17× reduction on
the inversion-setup block for a 34-channel cube (≈60 s → ≈3.5 s). The
remaining per-channel cost is just the Lᵀ·dirty_image matmul + NNLS + log-ev.
(Absolute seconds are SMA-scale on a contended laptop CPU and provisional — the
ratios are the robust deliverable; re-measure at ALMA scale on a quiet A100 to
pin the cube-level number. The old shared_lwl_savings_estimate ≈ 17% field in
the breakdown JSON under-counts because it credits F against the full ~170 s
cube rather than against the inversion-setup block F actually dominates.)
The blocker is purely architectural, and it lives in PyAutoFit, not in
PyAutoLens. As the design note in PyAutoPrompt/autoarray/datacube.md states:
"The problem here is the analysis list API does not currently share
information across likelihood functions or analysis objects. We therefore
either need to make a DataCube data class, Inversion object and add bespoke
source code, or we need to have AnalysisCombined objects be able to share
information in their likelihood functions."
This prompt is the second, general option: give FactorGraphModel a way for
its factors to share per-evaluation state. The bespoke-DataCube-class option is
explicitly not what we want — it would solve only lensing cubes and bake
domain logic into a one-off path.
Plan
Phase 1 — PyAutoFit: the generic mechanism
- Add the shared-state protocol (chosen option from Design Q1/Q2) to
FactorGraphModel (collection.py) and AnalysisFactor
(declarative/factor/analysis.py), with a default that is a no-op so all
existing graphs are byte-for-byte unchanged.
- Add the opt-in surface to
Analysis (non_linear/analysis/analysis.py) —
default shared_state_from(instance) -> None (or equivalent), mirroring the
modify_before_fit precedent.
- Thread
shared= through log_likelihood_function signatures with a defaulted
kwarg; keep EPAnalysisFactor working.
- Unit tests in
test_autofit/graphical/: a 3-factor mock graph where the
shared object is a counter proving compute_shared runs once per eval (not
N times), the sum is correct, and a graph with no provider is unchanged.
Phase 2 — PyAutoLens: the datacube consumer
- On the interferometer datacube path, implement the lensing-specific
compute_shared: ray-trace the shared lens model once, build the Delaunay
mapper + mapping matrix L (and, where uv/noise are channel-invariant, the
curvature F) once, and hand it to every channel's
AnalysisInterferometer.log_likelihood_function to consume in place of its own
rebuild.
- Per-channel work that remains: data vector
D (channel visibilities),
NNLS reconstruction, log-evidence.
- Fall back to the current per-channel path when the shared-invariance precondition
doesn't hold.
Phase 3 — autolens_workspace + profiling
- Update the datacube modeling/likelihood scripts to opt into the shared path.
- Re-run
autolens_profiling/likelihood_breakdown/datacube/delaunay.py (which now
carries the inversion-setup sub-decomposition as a permanent step) and record
the new cube cost. Per the decomposition above, ~97% of the per-channel
inversion work is shareable, so the inversion-setup block should drop ~17× for
a 34-channel cube (≈60 s → ≈3.5 s); the cube total drops from ~170 s toward the
per-channel residual (data-vector matmul + NNLS + log-ev, a few seconds) plus
one shared mapper+L+F build. Compare against the
inversion_setup_decompose_*.json artifact for the channel-invariant/variant
split that sets the ceiling.
Critical files
PyAutoFit (modify):
autofit/graphical/declarative/collection.py — FactorGraphModel.log_likelihood_function, the per-factor sum loop
autofit/graphical/declarative/factor/analysis.py — AnalysisFactor.log_likelihood_function, and the EPAnalysisFactor precedent
autofit/non_linear/analysis/analysis.py — Analysis base: new opt-in protocol method, modify_before_fit sibling
test_autofit/graphical/ — new tests
PyAutoFit (reference, do not modify):
EPAnalysisFactor (declarative/factor/analysis.py:257+) — the attach-state-to-analysis precedent
autofit/non_linear/analysis/model_analysis.py, visualize.py — other Analysis wrappers that must keep working
PyAutoLens (consumer, Phase 2):
- the
AnalysisInterferometer likelihood path + interferometer Inversion/mapper construction
autolens_workspace/scripts/interferometer/features/datacube/{likelihood_function,modeling,delaunay}.py
Profiling (Phase 3):
autolens_profiling/likelihood_breakdown/datacube/delaunay.py
autolens_profiling/likelihood_runtime/OPTIMIZATION_NOTES.md
Out of scope
- A bespoke
DataCube data class / cube-specific Inversion (the rejected option).
- The dense-route variant (production uses sparse; dense is not the target).
- Generalising shared state to arbitrary cross-factor gradients — likelihood
value only for now.
Cross-references
- autolens_workspace#120 — Aris's shared-
Lᵀ W̃ L optimisation, the origin
PyAutoPrompt/autoarray/datacube.md — the "analysis list API does not share
information" problem statement
PyAutoPrompt/issued/alma_datacube.md — Aris's Slack design + the channel-
invariance caveat (lines 24, 30, 34, 53, 207)
complete.md datacube roadmap entries (Phases 1-4, all shipped)
- the paired decomposition note in
autolens_profiling splitting the 78%
"inversion setup" block into mapper vs mapping-matrix vs data-vector, which
quantifies the real ceiling of this optimisation
Overview
FactorGraphModelfits N per-factorAnalysisobjects that share a model, summingtheir likelihoods. When the factors share model parameters, a large fraction of each
factor's likelihood is identical work recomputed N times. This task gives PyAutoFit a
domain-agnostic mechanism for a factor to compute a per-evaluation shared object once
and have every factor reuse it.
The work is built toy-first: a 1D Gaussian shared-state example (the natural extension
of
af.ex+scripts/features/graphical_models.py) is the development vehicle and proof,paired with fast, deterministic tests in
autofit_workspace_test. Proving the mechanism onthe toy here unblocks the motivating lensing consumer (ALMA datacube; sub-task B), where
~97% of the per-channel inversion-setup work is channel-invariant and currently rebuilt N
times. PyAutoFit ships only the generic, domain-agnostic protocol — no lensing logic.
Plan (autofit deliverable)
Phase 1 — PyAutoFit mechanism + toy
af.expieces (library, Opus)Analysis.shared_state_from(instance) -> None(defaultNone, sibling ofmodify_before_fit)and a defaulted
shared=kwarg onlog_likelihood_function.FactorGraphModelcomputes the shared object once from the lead factor before the loop and forwardsit to each factor only when non-
None→ existing graphs byte-for-byte unchanged.Analysistoautofit/example/analysis.pyso the toy is reusable asaf.exby both the workspace tutorial and the fast tests.test_autofit/graphical/.Phase 2 — autofit_workspace tutorial + docs (workspace, Opus prose)
scripts/features/shared_analysis_state.pyextendinggraphical_models.py, framing the sharedGaussian
model_dataas the toy analog of the lensing mapper/L/F. Notebook regen,features/README.mdentry, docs
.rstregistration.Phase 3 — autofit_workspace_test fast-assert tests (test workspace, Sonnet)
scripts/graphical/shared_state.py: counter provesshared_state_fromruns once per eval (not N×),exact shared-vs-unshared likelihood equality, no-provider graph unchanged, tiny end-to-end
DynestyStatic(nlive=50, maxcall=1000, maxiter=1000, number_of_cores=1).scripts/jax_assertions/: shared object threads throughjax.jitas a pytree;jit/vmap likelihood matches numpy.
run_all_scripts.sh(run underPYAUTO_TEST_MODE=1); added tosmoke_tests.txt.Follow-on — sub-task B (lensing), not in this issue: PyAutoLens datacube
shared_state_from(ray-trace + mapper + L + F) +
shared-awareAnalysisInterferometer+ per-channel fallback;autolens_workspace datacube scripts; autolens_workspace_test datacube assert; autolens_profiling
re-measure (~17× on the inversion-setup block for a 34-channel cube). Issued later via
/start_dev autolens/datacube_shared_state_consumer.md.Detailed implementation plan
Affected Repositories (this issue)
af.extoy + unit tests)Work Classification
Both (library first: PyAutoFit → autofit_workspace + autofit_workspace_test)
Branch Survey
Suggested branch:
feature/analysis-shared-stateWorktree root:
~/Code/PyAutoLabs-wt/analysis-shared-state/(created later by/start_library)Design decisions (locked)
Analysis.shared_state_from(instance) -> None(defaultNone).FactorGraphModelcalls it on the lead factor (first factor returning non-None; allNone→ no sharing).log_likelihood_function(self, instance, shared=None)(for
af.ex,(self, instance, shared=None, xp=np)). Forwarded only when non-None→ existing graphs unchanged.eval, no Python-side memoisation on the instance (avoids closure cache-busting).
owns the channel-invariance precondition and falls back to per-channel compute when it fails.
Analysislives in PyAutoFitautofit/example/analysis.py(
af.ex), reused by both the workspace tutorial and the test script — one implementation, no duplication.The 1D Gaussian toy (development vehicle)
Analysisobjects each fit a 1D dataset, sharing one Gaussian component (shared priors), eachadding its own per-dataset profile.
shared_state_from(instance)computes the shared component'smodel_dataarray once; each factor'slog_likelihood_function(instance, shared=...)reuses it instead of rebuilding it.model_data↔ the lensingmapper + L + curvature
F— the expensive block currently rebuilt N times.Implementation Steps (Phase 1 — PyAutoFit)
autofit/non_linear/analysis/analysis.py: addshared_state_from(self, instance)returningNone(docstring as the per-evaluation, cross-factor sibling of
modify_before_fit); baselog_likelihood_functionsignature →(self, instance, shared=None).autofit/graphical/declarative/collection.py: inFactorGraphModel.log_likelihood_function, computesharedfrom the lead factor before the loop; forwardshared=only when non-None; else call as today.autofit/graphical/declarative/factor/analysis.py:AnalysisFactor.log_likelihood_function(self, instance, shared=None)forwardssharedto the wrapped analysis (only when non-None);EPAnalysisFactorunaffected.autofit/graphical/declarative/factor/{prior,hierarchical}.py: add defaultedshared=Nonepass-through.autofit/example/analysis.py: shared-aware exampleAnalysis—shared_state_fromcomputes the sharedcomponent's
model_dataonce;log_likelihood_function(self, instance, shared=None, xp=np)reuses it, withthe
shared is Nonefallback rebuilding it (single-analysis path unchanged).test_autofit/graphical/: 3-factor mock graph; counter provesshared_state_fromruns once per eval (notN×); summed likelihood correct and exactly equal to the per-factor path; no-provider graph unchanged.
Key Files
autofit/graphical/declarative/collection.py— the per-factor sum loopautofit/graphical/declarative/factor/analysis.py— AnalysisFactor / EPAnalysisFactor precedentautofit/non_linear/analysis/analysis.py— Analysis base hook + modify_before_fit siblingautofit/graphical/declarative/factor/{prior,hierarchical}.py— kwarg pass-throughautofit/example/analysis.py— shared-awareaf.extoy Analysistest_autofit/graphical/— new unit testsscripts/features/shared_analysis_state.py+features/README.md+ docs.rstscripts/graphical/shared_state.py,scripts/jax_assertions/…,smoke_tests.txtTest conventions to follow (autofit_workspace_test)
nlive=50, maxcall=1000, maxiter=1000,number_of_cores=1), NOT mocks.run_all_scripts.shunderPYAUTO_TEST_MODE=1; optionally listed insmoke_tests.txt.==, exact likelihood equality) over stochastic-posterior tolerances.Original Prompt
Click to expand starting prompt
Cross-
Analysisshared-state mechanism forFactorGraphModelA large new PyAutoFit feature: let the per-factor
Analysisobjects in aFactorGraphModelshare per-evaluation, model-dependent precomputed stateacross each other, so that work which is identical for every factor at a given
point in parameter space is computed once and reused by all factors —
instead of every factor recomputing it independently.
Primary repo: @PyAutoFit (the mechanism). Consumer/proof: @PyAutoLens +
@autolens_workspace (the ALMA datacube likelihood that motivates it).
Hard constraint (read first)
PyAutoFit must not depend on PyAutoArray / PyAutoGalaxy / PyAutoLens (see
PyAutoFit/CLAUDE.md— "PyAutoFit does NOT depend on..."). Therefore:nothing about lensing, inversions, mappers, or visibilities. It only knows
"factors may want to compute a shared object once per evaluation and have all
factors see it."
each channel consumes it) lives in PyAutoLens (the
AnalysisInterferometerside) and is wired up in autolens_workspace datacube scripts.
So the deliverable is a generic shared-state protocol in PyAutoFit plus a
lensing consumer in PyAutoLens that proves it on the datacube.
Existing hooks to build on (do not reinvent)
PyAutoFit already has two precedents for injecting state into an
Analysisaround a fit — study both before designing:
EPAnalysisFactor(autofit/graphical/declarative/factor/analysis.py:257+)attaches a per-iteration
_cavity_mean_fieldonto its wrappedAnalysisimmediately before optimisation, so the user's
log_likelihood_functioncanread shared cross-factor messages. This is exactly the shape of mechanism we
want — state computed at the graph level and attached to each factor's
Analysis — except EP attaches it once per EP outer-iteration, whereas the
datacube needs it recomputed once per likelihood evaluation (the lens
parameters change every sample).
Analysis.modify_before_fit(autofit/non_linear/analysis/analysis.py:320)is the existing per-
Analysispre-fit hook. It is per-analysis and runs oncebefore sampling, so it cannot host per-evaluation shared state, but its
docstring ("alter the
Analysisin ways that can speed up the fitting") isthe precedent for "precompute-then-reuse" and the new hook should read as its
per-evaluation, cross-factor sibling.
Design (to refine in the issue, present options)
The core need: at each call to
FactorGraphModel.log_likelihood_function(instance),before the per-factor loop, optionally compute a shared object from the
instance, then make it available to every factor's
log_likelihood_function.Sketch of the target loop in
collection.py:Design questions the issue must resolve (present 2-3 concrete options, pick one):
Who computes the shared object? Options:
shared_state_providercallable/object set on theFactorGraphModel(domain-agnostic: it takes the instance, returns an opaque object).
compute_shared(instance)method; remaining factors receive its output.Analysis.shared_state_from(instance)protocol method(default returns
None) so any Analysis can opt in.Favour whichever keeps PyAutoFit domain-blind and makes the lensing side a
thin consumer.
How does a factor receive it? Options:
log_likelihood_function(self, instance, shared=None)with a default so every existing Analysis keeps working unchanged
(back-compat is mandatory — hundreds of Analyses exist).
EPAnalysisFactor(analysis._shared_state = ...)set/cleared around the loop.
The kwarg is cleaner and JIT-friendlier; the attribute path matches the EP
precedent. Decide explicitly and justify.
JAX / pytree correctness. The datacube path is JIT-compiled
(
use_jax=True,register_modelpytrees). The shared object will containtraced arrays (mapper triplets, mapping matrix, curvature). It must:
jax.jitas a normal pytree (no Python-side cachingthat cache-busts — see
feedback_jax_closure_cache_busts);lens parameters), not memoised across evals on the instance;
None→ identicalbehaviour and identical numbers).
Correctness + ordering. The shared object is only valid when the relevant
parameters really are shared across factors. The mechanism must not silently
produce wrong likelihoods if a user wires up factors whose "shared" inputs
actually differ. Decide whether to (a) trust the provider, (b) assert
structural equality of the relevant sub-instance across factors, or (c)
document the contract and leave it to the consumer. Note the physical caveat
from
alma_datacube.md: sharing is only valid whenuv_wavelengthsandnoise_mapare ~channel-invariant (narrow-emission-line regime); outside it,the consumer must fall back to per-factor compute.
Why this is needed (the motivating problem)
The ALMA datacube likelihood (autolens_workspace#120 and its roadmap, all
shipped: see
complete.md"datacube roadmap") fits an N-channel spectral cubeas N independent
AnalysisInterferometerobjects sharing one lens model,wired together with
af.FactorGraphModel. The FactorGraph routes the sharedlens parameters to every per-channel
AnalysisInterferometer.log_likelihood_functionand sums the results:
AnalysisFactorjust forwards to the wrapped analysis:The problem: because the lens model is shared across all channels, a large
fraction of each channel's likelihood is identical work. Profiling
(
autolens_profiling/likelihood_breakdown/datacube/delaunay.py, results inautolens_profiling/likelihood_runtime/OPTIMIZATION_NOTES.md) shows that for a34-channel cube the step-by-step CPU cost is ~170-205 s/eval, of which:
model, then building the source-plane mapper (Delaunay triangulation,
neighbours, pixel weights) and the mapping matrix L;
F = Lᵀ W̃ L;D, NNLS reconstruction, log-evidence) is genuinelyper-channel (it depends on each channel's distinct visibilities).
In the sparse / w̃ inversion route that production actually uses (this is the
important subtlety — see
PyAutoPrompt/issued/alma_datacube.mdand theinvestigation note in
complete.mdabout the transformer-free per-likelihoodpath), the expensive NUFFT is precomputed once at dataset load, so the
shareable per-eval work is the traced grids + Delaunay mapper + mapping matrix
L + curvature F — all pure functions of the shared lens model + shared source
mesh, currently rebuilt N times. The data vector and reconstruction are the only
irreducibly per-channel parts.
This is "Aris's deferred shared-
Lᵀ W̃ Loptimisation" (autolens_workspace#120).A decomposition of the dominant inversion-setup step
(
autolens_profiling/likelihood_breakdown/datacube/inversion_setup_decompose.py,SMA / CPU, sparse route) confirms
Lᵀ W̃ Lis exactly the right thing toshare and sizes the win:
Lᵀ W̃ LLᵀ·dirty_imageSo ~97% of the per-channel inversion work is channel-invariant, and the
curvature
Falone is ~86% of it — not the mapper as an earlier hypothesisassumed. Sharing the invariant block collapses the per-channel inversion total
from
N × ~1.81 sto~1.81 s + (N-1) × ~0.06 s— roughly a 17× reduction onthe inversion-setup block for a 34-channel cube (≈60 s → ≈3.5 s). The
remaining per-channel cost is just the
Lᵀ·dirty_imagematmul + NNLS + log-ev.(Absolute seconds are SMA-scale on a contended laptop CPU and provisional — the
ratios are the robust deliverable; re-measure at ALMA scale on a quiet A100 to
pin the cube-level number. The old
shared_lwl_savings_estimate ≈ 17%field inthe breakdown JSON under-counts because it credits
Fagainst the full ~170 scube rather than against the inversion-setup block
Factually dominates.)The blocker is purely architectural, and it lives in PyAutoFit, not in
PyAutoLens. As the design note in
PyAutoPrompt/autoarray/datacube.mdstates:This prompt is the second, general option: give
FactorGraphModela way forits factors to share per-evaluation state. The bespoke-
DataCube-class option isexplicitly not what we want — it would solve only lensing cubes and bake
domain logic into a one-off path.
Plan
Phase 1 — PyAutoFit: the generic mechanism
FactorGraphModel(collection.py) andAnalysisFactor(
declarative/factor/analysis.py), with a default that is a no-op so allexisting graphs are byte-for-byte unchanged.
Analysis(non_linear/analysis/analysis.py) —default
shared_state_from(instance) -> None(or equivalent), mirroring themodify_before_fitprecedent.shared=throughlog_likelihood_functionsignatures with a defaultedkwarg; keep
EPAnalysisFactorworking.test_autofit/graphical/: a 3-factor mock graph where theshared object is a counter proving
compute_sharedruns once per eval (notN times), the sum is correct, and a graph with no provider is unchanged.
Phase 2 — PyAutoLens: the datacube consumer
compute_shared: ray-trace the shared lens model once, build the Delaunaymapper + mapping matrix L (and, where
uv/noiseare channel-invariant, thecurvature
F) once, and hand it to every channel'sAnalysisInterferometer.log_likelihood_functionto consume in place of its ownrebuild.
D(channel visibilities),NNLS reconstruction, log-evidence.
doesn't hold.
Phase 3 — autolens_workspace + profiling
autolens_profiling/likelihood_breakdown/datacube/delaunay.py(which nowcarries the inversion-setup sub-decomposition as a permanent step) and record
the new cube cost. Per the decomposition above, ~97% of the per-channel
inversion work is shareable, so the inversion-setup block should drop ~17× for
a 34-channel cube (≈60 s → ≈3.5 s); the cube total drops from ~170 s toward the
per-channel residual (data-vector matmul + NNLS + log-ev, a few seconds) plus
one shared mapper+L+F build. Compare against the
inversion_setup_decompose_*.jsonartifact for the channel-invariant/variantsplit that sets the ceiling.
Critical files
PyAutoFit (modify):
autofit/graphical/declarative/collection.py—FactorGraphModel.log_likelihood_function, the per-factor sum loopautofit/graphical/declarative/factor/analysis.py—AnalysisFactor.log_likelihood_function, and theEPAnalysisFactorprecedentautofit/non_linear/analysis/analysis.py—Analysisbase: new opt-in protocol method,modify_before_fitsiblingtest_autofit/graphical/— new testsPyAutoFit (reference, do not modify):
EPAnalysisFactor(declarative/factor/analysis.py:257+) — the attach-state-to-analysis precedentautofit/non_linear/analysis/model_analysis.py,visualize.py— other Analysis wrappers that must keep workingPyAutoLens (consumer, Phase 2):
AnalysisInterferometerlikelihood path + interferometerInversion/mapper constructionautolens_workspace/scripts/interferometer/features/datacube/{likelihood_function,modeling,delaunay}.pyProfiling (Phase 3):
autolens_profiling/likelihood_breakdown/datacube/delaunay.pyautolens_profiling/likelihood_runtime/OPTIMIZATION_NOTES.mdOut of scope
DataCubedata class / cube-specificInversion(the rejected option).value only for now.
Cross-references
Lᵀ W̃ Loptimisation, the originPyAutoPrompt/autoarray/datacube.md— the "analysis list API does not shareinformation" problem statement
PyAutoPrompt/issued/alma_datacube.md— Aris's Slack design + the channel-invariance caveat (lines 24, 30, 34, 53, 207)
complete.mddatacube roadmap entries (Phases 1-4, all shipped)autolens_profilingsplitting the 78%"inversion setup" block into mapper vs mapping-matrix vs data-vector, which
quantifies the real ceiling of this optimisation