Skip to content

perf: direct-ndtr fast path for TruncatedGaussianPrior.value_for#1285

Merged
Jammy2211 merged 1 commit into
mainfrom
feature/truncated-gaussian-fast-path
May 20, 2026
Merged

perf: direct-ndtr fast path for TruncatedGaussianPrior.value_for#1285
Jammy2211 merged 1 commit into
mainfrom
feature/truncated-gaussian-fast-path

Conversation

@Jammy2211
Copy link
Copy Markdown
Collaborator

Summary

TruncatedGaussianPrior.value_for is the #1 cProfile hotspot identified
by the graphical-ep-scale-up
baseline: 33% of total wall time in a graphical Dynesty fit at N=10 and
~16% of EP at N=10 are spent inside scipy.stats._distn_infrastructure.cdf
— invoked from norm.cdf / norm.ppf inside this method. The actual
erf math is cheap; the cost is the scipy.stats Python-side wrapper.

This PR replaces both the prior's and TruncatedNormalMessage's
value_for bodies with a call to a new shared helper
(autofit.mapper.prior._erf_helpers.truncated_normal_value_for) that
uses scipy.special.ndtr / ndtri (and jax.scipy.special.ndtr /
ndtri on the JAX branch) directly. These are the same Cephes routines
that scipy.stats.norm.cdf / ppf wrap — bit-exact equivalent, just
without the dispatch.

Measured on the autofit_workspace_developer toy 1D Gaussian baseline:

Workload Pre-fix Post-fix Speedup
graphical N=3 22.8 s 5.6 s 4.04× (75% reduction)
EP N=3 251.9 s 76.3 s 3.30× (70% reduction)

The measured speedups are 2–4× larger than the cProfile-projected 30%
(graphical) / 17% (EP) because cProfile itself adds wrapper overhead
that masked the true scipy.stats cost in the baseline measurement.

Sanity blocks PASS on both fits; max log L matches the pre-fix baseline
within Dynesty stochastic noise (~1e-3 relative — same scale as
re-running with a different random seed).

API Changes

None — internal implementation change only. TruncatedGaussianPrior.value_for
and TruncatedNormalMessage.value_for retain identical signatures and
docstring behaviour. Their bodies now delegate to the new private helper
autofit.mapper.prior._erf_helpers.truncated_normal_value_for, which is
not part of the public API.

See full details below.

Test Plan

  • pytest test_autofit/mapper/prior/ test_autofit/messages/ — 261 passed (includes new bit-exact equivalence test against the OLD scipy.stats.norm.cdf/ppf composition on a (mean, sigma, lower, upper) × unit grid).
  • autofit_workspace_developer/graphical/fit.py --total_datasets=3 — sanity PASS, 4.04× speedup vs committed baseline.
  • autofit_workspace_developer/ep/fit.py --total_datasets=3 — sanity PASS, 3.30× speedup.
  • JAX parity test — JAX path matches NumPy path within 1e-9 relative.
  • TruncatedNormalMessage.value_for bit-exact to TruncatedGaussianPrior.value_for for matching parameters (both share helper).
Full API Changes (for automation & release notes)

Added

  • autofit.mapper.prior._erf_helpers.truncated_normal_value_for(unit, mean, sigma, lower_limit, upper_limit, xp=np) — internal helper, not exported via autofit.__init__. Computes the truncated-normal inverse CDF via scipy.special.ndtr / ndtri.

Changed Behaviour

  • autofit.TruncatedGaussianPrior.value_for(unit, xp=np) — implementation only; signature and outputs unchanged. The function now calls the helper directly instead of going through scipy.stats.norm.cdf / ppf (numpy) or jax.scipy.stats.norm.cdf / ppf (JAX).
  • autofit.messages.truncated_normal.TruncatedNormalMessage.value_for(unit, xp=np) — same change as above.

Removed

  • None.

Renamed

  • None.

Changed Signature

  • None.

Migration

  • None required. The change is binary-compatible: callers see identical numerical results (the new path uses scipy.special.ndtr / ndtri, which are the underlying Cephes routines that scipy.stats.norm.cdf / ppf already wrap).

🤖 Generated with Claude Code

Replace scipy.stats.norm.cdf/ppf inside the truncated-normal inverse-CDF
path with direct scipy.special.ndtr / ndtri (and the jax.scipy.special
equivalents on the JAX branch). The wrapper-free path skips
scipy.stats._distn_infrastructure dispatch -- which the
graphical-ep-scale-up cProfile baseline (autofit_workspace_developer
PR #17) showed was the #1 hotspot in TruncatedGaussianPrior.value_for
(~33% of total wall time at N=10).

The new helper module autofit.mapper.prior._erf_helpers exposes
truncated_normal_value_for(...) which both TruncatedGaussianPrior and
TruncatedNormalMessage now route through. ndtr/ndtri are bit-exact
equivalents of scipy.stats.norm.cdf/ppf (same Cephes routines).

Measured on the autofit_workspace_developer toy 1D Gaussian baseline:
  - graphical N=3:  22.8s -> 5.6s  (4.04x, 75% reduction)
  - EP N=3:        251.9s -> 76.3s (3.30x, 70% reduction)

Sanity blocks PASS on both; max log L matches pre-fix within Dynesty
stochastic noise (~1e-3 rel - same scale as re-running with a different
seed).

Closes #1284.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Jammy2211 Jammy2211 added the pending-release PR queued for the next release build label May 20, 2026
@Jammy2211 Jammy2211 merged commit 8929df5 into main May 20, 2026
7 checks passed
@Jammy2211 Jammy2211 deleted the feature/truncated-gaussian-fast-path branch May 20, 2026 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pending-release PR queued for the next release build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant