Skip to content

feat: wire af.NSS chunk_size kwarg into build_nss (PyAutoFit#1303)#43

Merged
Jammy2211 merged 1 commit into
mainfrom
feature/nss-chunked-vmap
May 29, 2026
Merged

feat: wire af.NSS chunk_size kwarg into build_nss (PyAutoFit#1303)#43
Jammy2211 merged 1 commit into
mainfrom
feature/nss-chunked-vmap

Conversation

@Jammy2211
Copy link
Copy Markdown
Contributor

Summary

Wires the new chunk_size kwarg shipped by PyAutoFit#1303 into autolens_profiling/searches/build_nss. Reverts the earlier num_delete=min(default, probe) Band-Aid that shrank the sampler's per-iteration batch to dodge OOMs; with PyAutoFit#1303's chunked-vmap path the sampler now keeps its preferred num_delete=50 while chunk_size = vmap_batch_for_cell(...) caps the inner-vmap fan-out per the A100-probed budget.

Unblocks the A100 NSS Delaunay + pixelization cells that were OOMing at ~28 GB on every retry (jobs 322592/96/600/602/604).

Scripts Changed

  • searches/_samplers.pybuild_nss(...) now passes chunk_size=vmap_batch_for_cell(...) to af.NSS, keeps num_delete=50 (default). Comment block explains the memory-budget split.
  • searches/_runner.py_sampler_config_dict nss branch records chunk_size in the metric JSON and reverts the num_delete cap so the JSON shows what was actually constructed.

Upstream PR

PyAutoLabs/PyAutoFit#1303feat(nss): chunk_size kwarg for inversion-heavy A100 likelihoods

Test Plan

  • 18 leaf scripts (9 Nautilus + 9 NSS) pass AUTOLENS_PROFILING_SMOKE=1 under the worktree-pinned PyAutoFit.
  • sweep.py --dry-run --only nss/imaging/mge dispatches correctly with the new sampler_config shape.
  • 5D Gaussian smoke confirms bit-identical log_Z between chunk_size=None and chunk_size=2 on the same seed.
  • Post-merge: resubmit A100 NSS pixelization + delaunay × HST × fp64; confirm completion (was OOMing as 322602 / 322604). Compare against Nautilus baselines: pixelization 46.5 ms/eval / 46 min (322603); delaunay 84.8 ms/eval / 45 min (322601).

🤖 Generated with Claude Code

Now that PyAutoFit#1303 ships an additive chunk_size knob on af.NSS,
switch build_nss to use it: keep num_delete at the SLaM default (50)
for convergence, and pass chunk_size=vmap_batch_for_cell(...) to cap
the inner-vmap fan-out at the A100-probed budget per cell.

Reverts the earlier num_delete=min(default, probe) Band-Aid, which
shrank the sampler's per-iteration batch to preserve memory and
hurt convergence. With the new chunked path the sampler keeps its
preferred num_delete=50 while peak GPU memory becomes
chunk_size × per_particle_state instead of num_delete × ...

_runner.py:_sampler_config_dict records chunk_size in the metric
JSON so each run's sampler_config block faithfully reflects what
was constructed.

Unblocks the A100 NSS Delaunay + pixelization cells that were OOMing
at ~28 GB on every retry (jobs 322592/96/600/602/604 — see #1301
for full diagnosis).

Refs PyAutoFit#1301, PyAutoFit#1303

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Jammy2211 Jammy2211 added the pending-release Queued for the next release label May 29, 2026
@Jammy2211 Jammy2211 merged commit b97c3e8 into main May 29, 2026
1 check failed
@Jammy2211 Jammy2211 deleted the feature/nss-chunked-vmap branch May 29, 2026 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pending-release Queued for the next release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant