You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TruncatedGaussianPrior.value_for is the #1 cProfile hotspot identified
by the graphical-ep-scale-up scoping pass (PR #17 on
autofit_workspace_developer): it accounts for 33% of total wall time
in a graphical-model joint Dynesty fit at N=10, and ~16% of an EP
toy run at N=10. The current code dispatches on xp (numpy vs JAX) but
on both paths goes through scipy.stats.norm.cdf / ppf, whose _distn_infrastructure wrapper overhead dwarfs the underlying erf
computation. This task replaces both paths with direct erf / erfinv
calls — mathematically identical to within ULPs.
Expected wall-time reduction: ~30% for graphical, ~17% for EP at
N=10. Numerics must match the pre-fix baseline to 1e-6 relative
tolerance — that's the merge gate.
Plan
Add a small helper (in autofit/mapper/prior/) that exposes a tail-safe truncated_normal_value_for(unit, mean, sigma, lower, upper, xp) built on xp.special.erf / erfinv / erfc / erfcinv.
Replace the body of TruncatedGaussianPrior.value_for (and the equivalent code in TruncatedNormalMessage if it routes through scipy.stats.norm too).
Add a numerical-equivalence test that asserts the new path matches scipy.stats.truncnorm.ppf to 1e-12 relative error over a parameter grid that includes deep tails (a=10/b=20, narrow brackets).
Re-run the autofit_workspace_developer/graphical/fit.py and ep/fit.py baselines at N=3/10/30. Posterior means, max log L, and sanity checks must match the pre-fix profiles/baseline.json to 1e-6 relative tolerance. Wall-time reduction is the speed gate (≥20% graphical, ≥10% EP at N=10).
All existing test_autofit/mapper/prior/ and test_autofit/non_linear/ tests pass unchanged.
Detailed implementation plan
Affected Repositories
PyAutoFit (primary)
autofit_workspace_developer (post-merge baseline refresh — separate PR)
Work Classification
Library (PyAutoFit). Workspace baseline refresh happens after the library lands.
Branch Survey
Repository
Current Branch
Dirty?
./PyAutoFit
main
clean
Suggested branch:feature/truncated-gaussian-fast-path Worktree root:~/Code/PyAutoLabs-wt/truncated-gaussian-fast-path/ (created later by /start_library)
Implementation Steps
Read autofit/mapper/prior/truncated_gaussian.py:136(value_for) and autofit/messages/truncated_normal.py. Decide whether the helper lives in a new private module (autofit/mapper/prior/_erf_helpers.py) or as a static method on TruncatedGaussianPrior. Lean private module so the EP scoping follow-up (paths.ep_mode flag) can also reuse it.
a = (lower - mean) / sigma; b = (upper - mean) / sigma
Phi_a = 0.5 * (1 + erf(a / sqrt(2))), switching to 0.5 * erfc(-a / sqrt(2)) when a < -1 (or equivalently the tail-safe form)
Same for Phi_b
p = Phi_a + unit * (Phi_b - Phi_a)
x = sqrt(2) * erfinv(2*p - 1), switching to sqrt(2) * erfcinv(2*(1 - p)) when p > 0.9 (and the symmetric form when p < 0.1)
return mean + sigma * x
Replace value_for body in truncated_gaussian.py with a call to the helper. Keep the docstring; update it to reflect the new implementation reference.
Audit autofit/messages/truncated_normal.py for any scipy.stats.norm.cdf / ppf calls and route them through the helper too.
Add test_autofit/mapper/prior/test_truncated_gaussian_erf.py (or extend the existing test) with a numerical-equivalence test against scipy.stats.truncnorm.ppf. Parameter grid:
tolerance: 1e-12 relative for moderate cases; 1e-9 for a=10/b=20 deep tails.
Run pytest test_autofit/mapper/prior/ and pytest test_autofit/messages/ — must pass unchanged.
From the autofit_workspace_developer checkout (attached to the same task worktree), regenerate the graphical/ep baselines at N=3/10/30 and diff against the committed profiles/baseline.json. Posterior values + max log L within 1e-6 rel; wall time meets the speed gate.
Ship the PyAutoFit PR. The workspace baseline refresh is a follow-up PR in autofit_workspace_developer.
PyAutoFit/autofit/messages/truncated_normal.py — verify + update if applicable.
PyAutoFit/test_autofit/mapper/prior/test_truncated_gaussian.py — extended with the equivalence test.
Original Prompt
Click to expand starting prompt
TruncatedGaussianPrior.value_for — direct-erf fast path
Replace the scipy.stats.norm.cdf / jax.scipy.stats.norm.cdf calls inside @PyAutoFit/autofit/mapper/prior/truncated_gaussian.py:136(value_for) with
direct erf / erfinv calls from scipy.special (numpy) and jax.scipy.special (JAX). The current implementation routes through scipy.stats._distn_infrastructure.cdf, which has substantial Python-side
wrapper overhead per call (arg validation, broadcasting setup, dispatch
chain) that dwarfs the actual erf computation.
This is the #1 cProfile hotspot identified by the graphical-ep-scale-up scoping pass — see PyAutoPrompt/graphical_ep/graphical_scoping.md and PyAutoPrompt/graphical_ep/ep_scoping.md.
Motivation
cProfile attribution at N=10 from autofit_workspace_developer/{graphical,ep}/profiles/N10_hotspots.txt:
Package
Total
value_for cumtime
scipy.stats..cdf cumtime
% of total
graphical
60 s
22.7 s
19.5 s (184 200 calls)
33%
ep
724 s
not isolated
116.4 s (1 599 610 calls)
16%
Almost all of value_for's cumtime is the scipy.stats wrapper — the actual erf math is fast. GaussianPrior.value_for (the non-truncated variant)
already uses the direct-erfinv approach on its JAX path
(gaussian.py:117); this prompt extends that pattern to the truncated
variant on both numpy and JAX.
Numerics-preserving identity
norm.cdf(z) ≡ 0.5 * (1 + erf(z / sqrt(2)))
≡ 0.5 * erfc(-z / sqrt(2)) (preferred when z << 0)
norm.ppf(p) ≡ sqrt(2) * erfinv(2*p - 1)
≡ -sqrt(2) * erfcinv(2*p) (preferred when p close to 1)
≡ sqrt(2) * erfcinv(2*(1-p)) (preferred when p close to 0)
Both forms are mathematically equivalent to scipy.stats.norm.cdf/ppf
and produce the same float64 values to within ULPs. The erfc/erfcinv
forms are used when the argument is in the tail to avoid catastrophic
cancellation.
For TruncatedGaussianPrior:
a = (lower_limit - mean) / sigma
b = (upper_limit - mean) / sigma
Phi_a = 0.5 * (1 + erf(a / sqrt(2))) # or erfc form if a << 0
Phi_b = 0.5 * (1 + erf(b / sqrt(2))) # or erfc form if b >> 0
p = Phi_a + unit * (Phi_b - Phi_a)
x = sqrt(2) * erfinv(2*p - 1) # or erfcinv forms if p near 0/1
return mean + sigma * x
Plan
Add a new helper module (or inline functions in the same file)
that exposes truncated_normal_value_for(unit, mean, sigma, lower, upper, xp)
built on xp.special.erf/erfinv (importing scipy.special for numpy
and jax.scipy.special for jax). Centralise the tail-safe erfc/ erfcinv branching so both the prior and the message classes can call
into it.
The corresponding code path in autofit/messages/truncated_normal.py (if TruncatedNormalMessage
also uses scipy.stats.norm.cdf — verify; reuse the helper if so).
Numerical equivalence test in test_autofit/mapper/prior/test_truncated_gaussian.py: compare the
new value_for against scipy.stats.truncnorm.ppf (the
library-agnostic ground truth, not the old code path) on a grid of (unit, mean, sigma, lower, upper) including extreme truncations
(a=10/b=20, a=-20/b=-10, narrow [0.499, 0.501] bracket). Tolerance: 1e-12 relative error for moderate cases, 1e-9 in the deep tails.
Benchmark gate. Run autofit_workspace_developer/graphical/fit.py --total_datasets={3,10,30}
and ep/fit.py --total_datasets={3,10,30} from a clean checkout
and compare profiles/baseline.json against the pre-fix baseline
committed by issue model.results mislabelled parameters #16. Expected wall-time reduction:
Graphical: ~30% (prior transforms drop from 38% of total to <5%)
EP: ~17% (same fix, smaller share of total runtime)
Sanity-check max log L and recovered posteriors must match the
pre-fix baseline to within 1e-6 relative tolerance — this is the
correctness gate, the speed gain is only meaningful if numerics
don't drift.
Run the full PyAutoFit prior test suite: pytest test_autofit/mapper/prior/
and any tests under test_autofit/non_linear/ that exercise the
prior transform. They must all pass unchanged.
(Verification only, no changes expected) @PyAutoFit/autofit/mapper/prior/gaussian.py
— already uses the direct-erfinv pattern on JAX; cross-check the new
helper is consistent.
Out of scope
Replacing scipy.special.erf with a hand-rolled rational approximation —
scipy already calls into the system libmerf, which is what we want.
Other priors (LogGaussianPrior, LogUniformPrior): they don't show
up in the cProfile data and are not on the scaling critical path. If
the helper extracts cleanly they can re-use it later — out of scope
for this issue.
JAX vs numpy parity audit beyond TruncatedGaussianPrior: separate
scope.
Success criteria
All existing PyAutoFit prior tests pass.
New numerical equivalence test passes at 1e-12 rel tolerance.
autofit_workspace_developer/graphical/fit.py --total_datasets={3,10,30}
posterior means and max log L match the pre-fix baseline.json to 1e-6 rel tolerance.
Graphical wall time at N=10 drops by ≥20% (target ~30%); EP wall time
at N=10 drops by ≥10% (target ~17%).
Updated baseline.json files are committed in a follow-up
autofit_workspace_developer PR after the library change merges.
Overview
TruncatedGaussianPrior.value_foris the #1 cProfile hotspot identifiedby the
graphical-ep-scale-upscoping pass (PR #17 onautofit_workspace_developer): it accounts for 33% of total wall time
in a graphical-model joint Dynesty fit at N=10, and ~16% of an EP
toy run at N=10. The current code dispatches on
xp(numpy vs JAX) buton both paths goes through
scipy.stats.norm.cdf/ppf, whose_distn_infrastructurewrapper overhead dwarfs the underlyingerfcomputation. This task replaces both paths with direct
erf/erfinvcalls — mathematically identical to within ULPs.
Expected wall-time reduction: ~30% for graphical, ~17% for EP at
N=10. Numerics must match the pre-fix baseline to
1e-6relativetolerance — that's the merge gate.
Plan
autofit/mapper/prior/) that exposes a tail-safetruncated_normal_value_for(unit, mean, sigma, lower, upper, xp)built onxp.special.erf/erfinv/erfc/erfcinv.TruncatedGaussianPrior.value_for(and the equivalent code inTruncatedNormalMessageif it routes throughscipy.stats.normtoo).scipy.stats.truncnorm.ppfto1e-12relative error over a parameter grid that includes deep tails (a=10/b=20, narrow brackets).autofit_workspace_developer/graphical/fit.pyandep/fit.pybaselines at N=3/10/30. Posterior means, max log L, and sanity checks must match the pre-fixprofiles/baseline.jsonto1e-6relative tolerance. Wall-time reduction is the speed gate (≥20% graphical, ≥10% EP at N=10).test_autofit/mapper/prior/andtest_autofit/non_linear/tests pass unchanged.Detailed implementation plan
Affected Repositories
PyAutoFit(primary)autofit_workspace_developer(post-merge baseline refresh — separate PR)Work Classification
Library (PyAutoFit). Workspace baseline refresh happens after the library lands.
Branch Survey
./PyAutoFitSuggested branch:
feature/truncated-gaussian-fast-pathWorktree root:
~/Code/PyAutoLabs-wt/truncated-gaussian-fast-path/(created later by/start_library)Implementation Steps
autofit/mapper/prior/truncated_gaussian.py:136(value_for)andautofit/messages/truncated_normal.py. Decide whether the helper lives in a new private module (autofit/mapper/prior/_erf_helpers.py) or as a static method onTruncatedGaussianPrior. Lean private module so the EP scoping follow-up (paths.ep_modeflag) can also reuse it.truncated_normal_value_for(unit, mean, sigma, lower, upper, xp):a = (lower - mean) / sigma; b = (upper - mean) / sigmaPhi_a = 0.5 * (1 + erf(a / sqrt(2))), switching to0.5 * erfc(-a / sqrt(2))whena < -1(or equivalently the tail-safe form)Phi_bp = Phi_a + unit * (Phi_b - Phi_a)x = sqrt(2) * erfinv(2*p - 1), switching tosqrt(2) * erfcinv(2*(1 - p))whenp > 0.9(and the symmetric form whenp < 0.1)return mean + sigma * xvalue_forbody intruncated_gaussian.pywith a call to the helper. Keep the docstring; update it to reflect the new implementation reference.autofit/messages/truncated_normal.pyfor anyscipy.stats.norm.cdf/ppfcalls and route them through the helper too.test_autofit/mapper/prior/test_truncated_gaussian_erf.py(or extend the existing test) with a numerical-equivalence test againstscipy.stats.truncnorm.ppf. Parameter grid:unit ∈ {1e-9, 1e-6, 1e-3, 0.1, 0.3, 0.5, 0.7, 0.9, 1-1e-3, 1-1e-6, 1-1e-9}(mean, sigma, lower, upper) ∈ {(0,1,-3,3), (0,1,-10,10), (0,1,-20,-10), (0,1,10,20), (5,2,0,inf), (0,1,-0.001,0.001)}1e-12relative for moderate cases;1e-9fora=10/b=20deep tails.pytest test_autofit/mapper/prior/andpytest test_autofit/messages/— must pass unchanged.autofit_workspace_developercheckout (attached to the same task worktree), regenerate the graphical/ep baselines at N=3/10/30 and diff against the committedprofiles/baseline.json. Posterior values + max log L within1e-6rel; wall time meets the speed gate.Key Files
PyAutoFit/autofit/mapper/prior/truncated_gaussian.py:136—value_forreplaced.PyAutoFit/autofit/mapper/prior/_erf_helpers.py(new) — shared helper.PyAutoFit/autofit/messages/truncated_normal.py— verify + update if applicable.PyAutoFit/test_autofit/mapper/prior/test_truncated_gaussian.py— extended with the equivalence test.Original Prompt
Click to expand starting prompt
TruncatedGaussianPrior.value_for— direct-erf fast pathReplace the
scipy.stats.norm.cdf/jax.scipy.stats.norm.cdfcalls inside@PyAutoFit/autofit/mapper/prior/truncated_gaussian.py:136(value_for)withdirect
erf/erfinvcalls fromscipy.special(numpy) andjax.scipy.special(JAX). The current implementation routes throughscipy.stats._distn_infrastructure.cdf, which has substantial Python-sidewrapper overhead per call (arg validation, broadcasting setup, dispatch
chain) that dwarfs the actual
erfcomputation.This is the #1 cProfile hotspot identified by the
graphical-ep-scale-upscoping pass — seePyAutoPrompt/graphical_ep/graphical_scoping.mdandPyAutoPrompt/graphical_ep/ep_scoping.md.Motivation
cProfile attribution at N=10 from
autofit_workspace_developer/{graphical,ep}/profiles/N10_hotspots.txt:value_forcumtimescipy.stats..cdfcumtimeAlmost all of
value_for's cumtime is the scipy.stats wrapper — the actualerfmath is fast.GaussianPrior.value_for(the non-truncated variant)already uses the direct-erfinv approach on its JAX path
(
gaussian.py:117); this prompt extends that pattern to the truncatedvariant on both numpy and JAX.
Numerics-preserving identity
Both forms are mathematically equivalent to
scipy.stats.norm.cdf/ppfand produce the same float64 values to within ULPs. The
erfc/erfcinvforms are used when the argument is in the tail to avoid catastrophic
cancellation.
For
TruncatedGaussianPrior:Plan
that exposes
truncated_normal_value_for(unit, mean, sigma, lower, upper, xp)built on
xp.special.erf/erfinv(importingscipy.specialfor numpyand
jax.scipy.specialfor jax). Centralise the tail-safeerfc/erfcinvbranching so both the prior and the message classes can callinto it.
autofit/mapper/prior/truncated_gaussian.py:136(value_for)autofit/messages/truncated_normal.py(ifTruncatedNormalMessagealso uses
scipy.stats.norm.cdf— verify; reuse the helper if so).test_autofit/mapper/prior/test_truncated_gaussian.py: compare thenew
value_foragainstscipy.stats.truncnorm.ppf(thelibrary-agnostic ground truth, not the old code path) on a grid of
(unit, mean, sigma, lower, upper)including extreme truncations(a=10/b=20, a=-20/b=-10, narrow [0.499, 0.501] bracket). Tolerance:
1e-12relative error for moderate cases,1e-9in the deep tails.autofit_workspace_developer/graphical/fit.py --total_datasets={3,10,30}and
ep/fit.py --total_datasets={3,10,30}from a clean checkoutand compare
profiles/baseline.jsonagainst the pre-fix baselinecommitted by issue model.results mislabelled parameters #16. Expected wall-time reduction:
pre-fix baseline to within
1e-6relative tolerance — this is thecorrectness gate, the speed gain is only meaningful if numerics
don't drift.
pytest test_autofit/mapper/prior/and any tests under
test_autofit/non_linear/that exercise theprior transform. They must all pass unchanged.
Affected files
@PyAutoFit/autofit/mapper/prior/truncated_gaussian.py—value_forbody.@PyAutoFit/autofit/messages/truncated_normal.py— verify and update if italso uses
scipy.stats.norm.cdf/ppf.@PyAutoFit/autofit/mapper/prior/abstract.pyor a new file inautofit/mapper/prior/_erf_helpers.py— the shared helper.@PyAutoFit/test_autofit/mapper/prior/test_truncated_gaussian.py— numericalequivalence + extreme-truncation tests.
@PyAutoFit/autofit/mapper/prior/gaussian.py— already uses the direct-erfinv pattern on JAX; cross-check the new
helper is consistent.
Out of scope
scipy.special.erfwith a hand-rolled rational approximation —scipy already calls into the system
libmerf, which is what we want.LogGaussianPrior,LogUniformPrior): they don't showup in the cProfile data and are not on the scaling critical path. If
the helper extracts cleanly they can re-use it later — out of scope
for this issue.
TruncatedGaussianPrior: separatescope.
Success criteria
1e-12rel tolerance.autofit_workspace_developer/graphical/fit.py --total_datasets={3,10,30}posterior means and max log L match the pre-fix
baseline.jsonto1e-6rel tolerance.at N=10 drops by ≥10% (target ~17%).
autofit_workspace_developer PR after the library change merges.