perf: direct-ndtr fast path for TruncatedGaussianPrior.value_for#1285
Merged
Conversation
Replace scipy.stats.norm.cdf/ppf inside the truncated-normal inverse-CDF path with direct scipy.special.ndtr / ndtri (and the jax.scipy.special equivalents on the JAX branch). The wrapper-free path skips scipy.stats._distn_infrastructure dispatch -- which the graphical-ep-scale-up cProfile baseline (autofit_workspace_developer PR #17) showed was the #1 hotspot in TruncatedGaussianPrior.value_for (~33% of total wall time at N=10). The new helper module autofit.mapper.prior._erf_helpers exposes truncated_normal_value_for(...) which both TruncatedGaussianPrior and TruncatedNormalMessage now route through. ndtr/ndtri are bit-exact equivalents of scipy.stats.norm.cdf/ppf (same Cephes routines). Measured on the autofit_workspace_developer toy 1D Gaussian baseline: - graphical N=3: 22.8s -> 5.6s (4.04x, 75% reduction) - EP N=3: 251.9s -> 76.3s (3.30x, 70% reduction) Sanity blocks PASS on both; max log L matches pre-fix within Dynesty stochastic noise (~1e-3 rel - same scale as re-running with a different seed). Closes #1284. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TruncatedGaussianPrior.value_foris the #1 cProfile hotspot identifiedby the
graphical-ep-scale-upbaseline: 33% of total wall time in a graphical Dynesty fit at N=10 and
~16% of EP at N=10 are spent inside
scipy.stats._distn_infrastructure.cdf— invoked from
norm.cdf/norm.ppfinside this method. The actualerfmath is cheap; the cost is thescipy.statsPython-side wrapper.This PR replaces both the prior's and
TruncatedNormalMessage'svalue_forbodies with a call to a new shared helper(
autofit.mapper.prior._erf_helpers.truncated_normal_value_for) thatuses
scipy.special.ndtr/ndtri(andjax.scipy.special.ndtr/ndtrion the JAX branch) directly. These are the same Cephes routinesthat
scipy.stats.norm.cdf/ppfwrap — bit-exact equivalent, justwithout the dispatch.
Measured on the
autofit_workspace_developertoy 1D Gaussian baseline:The measured speedups are 2–4× larger than the cProfile-projected 30%
(graphical) / 17% (EP) because cProfile itself adds wrapper overhead
that masked the true
scipy.statscost in the baseline measurement.Sanity blocks PASS on both fits; max log L matches the pre-fix baseline
within Dynesty stochastic noise (~1e-3 relative — same scale as
re-running with a different random seed).
API Changes
None — internal implementation change only.
TruncatedGaussianPrior.value_forand
TruncatedNormalMessage.value_forretain identical signatures anddocstring behaviour. Their bodies now delegate to the new private helper
autofit.mapper.prior._erf_helpers.truncated_normal_value_for, which isnot part of the public API.
See full details below.
Test Plan
pytest test_autofit/mapper/prior/ test_autofit/messages/— 261 passed (includes new bit-exact equivalence test against the OLDscipy.stats.norm.cdf/ppfcomposition on a (mean, sigma, lower, upper) × unit grid).autofit_workspace_developer/graphical/fit.py --total_datasets=3— sanity PASS, 4.04× speedup vs committed baseline.autofit_workspace_developer/ep/fit.py --total_datasets=3— sanity PASS, 3.30× speedup.TruncatedNormalMessage.value_forbit-exact toTruncatedGaussianPrior.value_forfor matching parameters (both share helper).Full API Changes (for automation & release notes)
Added
autofit.mapper.prior._erf_helpers.truncated_normal_value_for(unit, mean, sigma, lower_limit, upper_limit, xp=np)— internal helper, not exported viaautofit.__init__. Computes the truncated-normal inverse CDF viascipy.special.ndtr/ndtri.Changed Behaviour
autofit.TruncatedGaussianPrior.value_for(unit, xp=np)— implementation only; signature and outputs unchanged. The function now calls the helper directly instead of going throughscipy.stats.norm.cdf/ppf(numpy) orjax.scipy.stats.norm.cdf/ppf(JAX).autofit.messages.truncated_normal.TruncatedNormalMessage.value_for(unit, xp=np)— same change as above.Removed
Renamed
Changed Signature
Migration
scipy.special.ndtr/ndtri, which are the underlying Cephes routines thatscipy.stats.norm.cdf/ppfalready wrap).🤖 Generated with Claude Code