fix: sweep.py wipes search state by default to defeat .completed resume by Jammy2211 · Pull Request #32 · PyAutoLabs/autolens_profiling

Jammy2211 · 2026-05-28T13:27:17Z

Summary

Run 322549 came back 3.5× faster than 322548 on the same A100 cell. Looked like cache magic. It wasn't — run 2 didn't sample at all. It loaded the cached samples.csv + Nautilus pickle from run 1 and reported the same total_samples=65500 with a meaningless time_per_eval_ms=2.82.

PyAutoFit's resume gate is the .completed sentinel file (abstract_search.py:520):

if not self.paths.is_complete:
    result = self.start_resume_fit(...)
else:
    result = self.result_via_completed_fit(...)

force_pickle_overwrite=True only re-writes output pickles on the resume path; it doesn't bypass the gate. The previous _samplers.py comment claiming otherwise was wrong, and PR #29's README inherited the same mistake.

For production (SLaM chained phases) the resume default is correct. For profiling it produces phantom speedups whenever a prior attempt completed sampling — including the post-fit-latent-crash case that PR #30 just fixed, where run 1's sampling completed before the crash and left .completed behind.

Changes

sweep.py — --keep-completed flag (default off). When off, sweep removes output/searches/<sampler>/<ds>/<model>/<instrument>/<config>/ before each cell run, wiping .completed + Nautilus pickle + cached samples.csv. A one-line [clear-completed] removed ... log per cell so wipes are auditable.
_samplers.py — fix the force_pickle_overwrite docstring; clarify it controls output-file re-writes, not the resume gate.
README.md — rewrite the corresponding paragraph; document the sweep-level wipe as the actual mechanism.

Honest run-1 number

From job 322548's log timestamps:

10:55:35 script start
10:56:01 Visualization warm-up complete → JIT + setup = 26s
10:56:01 → 11:03:13 Nautilus sampling, 65,500 evals → 432 s → 6.6 ms/eval (real)
11:03:13 → 11:07:12 two final perform_updates + latent crash → ~4 min

The 2.82 ms in run 2's JSON is (load + viz wall) / cached-sample-count — not a per-eval cost.

Test plan

sweep.py --help shows --keep-completed.
Dry-run prints [clear-completed] (dry-run) would remove ... when the dir exists.
Real wipe via _wipe_search_state(...) removes .completed + nested files.
No-op when the output dir doesn't exist.
Re-submit A100 with the wipe; confirm Nautilus iteration lines appear and per-eval cost lands around 6.6 ms.

🤖 Generated with Claude Code

Discovered while debugging the 3.5× run-2 vs run-1 speedup on the A100 HST MGE submit (job 322549 = 3m43s vs 322548 = 11m40s). Run 2 didn't actually re-sample — it loaded the cached samples.csv + Nautilus pickle left by run 1 and reported the same total_samples=65500 with a meaningless time_per_eval_ms=2.82. The resume gate is `.completed` (PyAutoFit/abstract_search.py:520-529), not `force_pickle_overwrite` as the previous comment claimed. `force_pickle_overwrite=True` only re-writes output pickles on an existing resume; it does not bypass the gate. For production (SLaM-style chained phases) the resume default is correct. For profiling it produces phantom speedups whenever a prior run completed sampling — even one that crashed in post-fit, as the latent-crash in PR #29 showed. - sweep.py: --keep-completed flag (default off). When off, removes output/searches/<sampler>/<ds>/<model>/<instrument>/<config>/ before each cell run, wiping .completed + Nautilus pickle + samples.csv. - _samplers.py: correct the docstring claim about force_pickle_overwrite. - README.md: rewrite the "force_pickle_overwrite defeats .completed" paragraph; document the sweep-level wipe as the actual mechanism. The honest A100 number from run 1's actual sampling window is ~6.6 ms/eval (432 s for 65500 evals between Visualization warm-up complete and the first Fit Running update), not the 2.82 ms in run 2's JSON. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Jammy2211 merged commit e5c2220 into main May 28, 2026
1 check failed

Jammy2211 deleted the fix/clear-completed-by-default branch May 28, 2026 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: sweep.py wipes search state by default to defeat .completed resume#32

fix: sweep.py wipes search state by default to defeat .completed resume#32
Jammy2211 merged 1 commit into
mainfrom
fix/clear-completed-by-default

Jammy2211 commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jammy2211 commented May 28, 2026

Summary

Changes

Honest run-1 number

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant