Skip to content

feat: first-class af.Nautilus search profiling + A100 submit#29

Merged
Jammy2211 merged 1 commit into
mainfrom
feature/first-class-search-profiling
May 28, 2026
Merged

feat: first-class af.Nautilus search profiling + A100 submit#29
Jammy2211 merged 1 commit into
mainfrom
feature/first-class-search-profiling

Conversation

@Jammy2211
Copy link
Copy Markdown
Contributor

Summary

  • Replaces raw nautilus.Sampler wrappers in searches/ with first-class af.Nautilus — every cell runs search.fit(model, analysis) end-to-end, so visualization, samples I/O, search.summary, and latent variables are profiled.
  • Sweep matrix: (sampler × dataset_class × model × instrument × hardware × precision) with a sampler registry (_samplers.py) ready for Dynesty/Emcee/BlackJAX as one-function additions.
  • _metrics.attach_viz_timer splits total_wall_s into sampler_wall_s + viz_wall_s by wrapping every analysis visualize hook (visualize, visualize_combined, visualize_before_fit, visualize_before_fit_combined) plus search.plot_results.
  • n_live matches the SLaM canonical phases — 200 for MGE/parametric/point-source (source_lp[1]), 150 for pixelization/Delaunay (source_pix[1]).
  • Datacube uses af.FactorGraphModel to combine N AnalysisInterferometer factors — mirrors autolens_workspace/scripts/multi/modeling.py. _DATACUBE_N_CHANNELS=4 by default.
  • sweep.py resumes by default (skips cells whose JSON exists); --force re-runs.
  • aggregate.py walks the 4-level (sampler/ds/model/instrument) tree and emits comparison.{json,png}.
  • hpc/batch_gpu/submit_imaging_mge_a100_hst_fp64 — first SLURM submit script in autolens_profiling/, modelled on the existing z_projects/profiling/hpc/batch_gpu/ submits.

Profiling-specific design notes

  • number_of_cores=1 everywhere — measures per-eval end-to-end cost, not pool throughput.
  • force_x1_cpu=True and use_jax_vmap=True on JAX rows — mandatory because nautilus.Sampler forking corrupts JAX state.
  • force_pickle_overwrite=True + unique path_prefix per cell defeat the .completed-file resume that would silently return cached results across sweep iterations.

Out of scope

  • HPC sync tool for autolens_profiling (would mirror z_projects/profiling/hpc/sync).
  • number_of_cores > 1 pool-scaling sweep.
  • Production-realistic adapt-image regeneration mid-search (uses truth-derived lensed_source.fits).

Test plan

  • All 9 leaf scripts pass AUTOLENS_PROFILING_SMOKE=1.
  • sweep.py --dry-run --only nautilus/imaging/mge --instrument hst dispatches the matrix correctly.
  • aggregate.py discovers cells + handles empty tree gracefully.
  • Module imports of _samplers, _metrics, _setup, _runner all clean.
  • Real A100 run via hpc/batch_gpu/submit_imaging_mge_a100_hst_fp64 (queued after merge).

🤖 Generated with Claude Code

Replaces the raw nautilus.Sampler wrappers in searches/ with a first-class
af.Nautilus profile that exercises the full PyAutoFit lifecycle:
visualization, samples I/O, search.summary, latent variables.

- Sweep matrix: (sampler × dataset_class × model × instrument × hardware ×
  precision). Sampler registry in _samplers.py is ready for Dynesty/Emcee/
  BlackJAX additions as one-function changes.
- Per-model n_live matches the SLaM canonical phases (200 for mge /
  point-source / parametric; 150 for pixelization / Delaunay / datacube).
- Datacube uses af.FactorGraphModel to combine N AnalysisInterferometer
  factors, mirroring autolens_workspace/scripts/multi/modeling.py.
- _metrics.attach_viz_timer wraps every visualize-family hook so the JSON
  splits total_wall_s into sampler_wall_s + viz_wall_s.
- force_pickle_overwrite=True + unique path_prefix per cell defeat the
  .completed-file resume that would otherwise return cached results
  across repeated sweep iterations.
- sweep.py: resume-by-default with --force override.
- aggregate.py: walks the 4-level (sampler/ds/model/instrument) tree and
  emits comparison.{json,png} per cell.
- hpc/batch_gpu/submit_imaging_mge_a100_hst_fp64: SLURM submit for the
  HST MGE fp64 cell on A100, modeled on the existing likelihood-profiling
  submits in z_projects/profiling/hpc/batch_gpu/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Jammy2211 Jammy2211 merged commit e5b1118 into main May 28, 2026
1 check failed
@Jammy2211 Jammy2211 deleted the feature/first-class-search-profiling branch May 28, 2026 09:47
Jammy2211 added a commit that referenced this pull request May 28, 2026
…me (#32)

Discovered while debugging the 3.5× run-2 vs run-1 speedup on the A100
HST MGE submit (job 322549 = 3m43s vs 322548 = 11m40s). Run 2 didn't
actually re-sample — it loaded the cached samples.csv + Nautilus pickle
left by run 1 and reported the same total_samples=65500 with a meaningless
time_per_eval_ms=2.82.

The resume gate is `.completed` (PyAutoFit/abstract_search.py:520-529),
not `force_pickle_overwrite` as the previous comment claimed.
`force_pickle_overwrite=True` only re-writes output pickles on an
existing resume; it does not bypass the gate.

For production (SLaM-style chained phases) the resume default is
correct. For profiling it produces phantom speedups whenever a prior
run completed sampling — even one that crashed in post-fit, as the
latent-crash in PR #29 showed.

- sweep.py: --keep-completed flag (default off). When off, removes
  output/searches/<sampler>/<ds>/<model>/<instrument>/<config>/ before
  each cell run, wiping .completed + Nautilus pickle + samples.csv.
- _samplers.py: correct the docstring claim about force_pickle_overwrite.
- README.md: rewrite the "force_pickle_overwrite defeats .completed"
  paragraph; document the sweep-level wipe as the actual mechanism.

The honest A100 number from run 1's actual sampling window is ~6.6 ms/eval
(432 s for 65500 evals between Visualization warm-up complete and the
first Fit Running update), not the 2.82 ms in run 2's JSON.

Co-authored-by: Jammy2211 <JNightingale2211@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant