Skip to content

chore: stop tracking simulator-regenerated datasets#137

Merged
Jammy2211 merged 1 commit intomainfrom
chore/untrack-generated-datasets
May 8, 2026
Merged

chore: stop tracking simulator-regenerated datasets#137
Jammy2211 merged 1 commit intomainfrom
chore/untrack-generated-datasets

Conversation

@Jammy2211
Copy link
Copy Markdown
Collaborator

Summary

  • Untrack ~220 files under dataset/ that are regenerated by simulators in scripts/{imaging,interferometer,multi,group,cluster,point_source,weak}/simulator.py and friends. They were perpetually dirty in git status (143 staged + many unstaged).
  • They were already covered by the dataset/ rule in .gitignore, but had been committed before the rule existed, so git kept tracking them.
  • Real observational data and simulator inputs are kept tracked (40 files total):
    • dataset/imaging/cosmos_web_ring/ — JWST COSMOS-Web; loaded by scripts/imaging/start_here.py, scripts/multi/start_here.py
    • dataset/imaging/slacs1430+4105/ — HST SLACS lens; used by scripts/guides/plot/examples/
    • dataset/group/102021990_NEG650312660474055399/ — real catalog object; loaded by scripts/group/start_here.py
    • dataset/interferometer/uv_wavelengths/sma.fits — synthetic SMA uv-coverage; read by scripts/multi/features/imaging_and_interferometer/simulator.py and several modeling/aggregator scripts
    • dataset/imaging/los_halos/los_halo_list.npy + los_sheet_values.npy — line-of-sight halo data; read by scripts/imaging/features/advanced/los_halos/simulator.py
    • dataset/.gitignore — the inner * + !.gitignore marker

.gitignore change

Replaces:

dataset/

with:

dataset/**
!dataset/.gitignore
!dataset/imaging/cosmos_web_ring/**
!dataset/imaging/slacs1430+4105/**
!dataset/group/102021990_NEG650312660474055399/**
!dataset/interferometer/uv_wavelengths/**
!dataset/imaging/los_halos/los_halo_list.npy
!dataset/imaging/los_halos/los_sheet_values.npy

This documents which datasets are deliberately tracked. Adding new real data later: git add -f dataset/path/to/data.fits and add a matching !dataset/... line to .gitignore. (The inner dataset/.gitignore still ignores everything by default, so -f is required for new files in keep-dirs.)

Effect

  • No source-code changes. Working-tree files stay on disk.
  • git status will go silent for the simulator-output dataset tree.
  • Fresh clones still get all real-data dirs (cosmos_web_ring, slacs1430+4105, 102021990_NEG..., uv_wavelengths, los_halos npys) — start_here.py tutorials work out of the box.
  • Fresh clones lack simulator-output dirs (e.g. dataset/imaging/simple/); running python scripts/imaging/simulator.py regenerates them.

Test plan

  • CI green on the branch
  • After merge: clone fresh, confirm dataset/imaging/cosmos_web_ring/, slacs1430+4105/, 102021990_NEG.../, uv_wavelengths/sma.fits, and los_halos/*.npy are all present
  • Run python scripts/imaging/simulator.py, confirm dataset/imaging/simple/ regenerates and git status stays clean
  • Run python scripts/imaging/start_here.py (loads cosmos_web_ring), confirm it still finds the data

🤖 Generated with Claude Code

Untrack the ~220 files under dataset/ that are regenerated by the
simulator scripts in scripts/{imaging,interferometer,multi,group,
cluster,point_source,weak}/simulator.py and friends. They were
already covered by the dataset/ rule in .gitignore but had been
committed before that rule existed, so git kept tracking them and
test runs left them perpetually dirty (currently 143 staged files
+ more unstaged).

Kept tracked (real observational data and simulator inputs):
- dataset/imaging/cosmos_web_ring/  (JWST COSMOS-Web; loaded by
  scripts/imaging/start_here.py and scripts/multi/start_here.py)
- dataset/imaging/slacs1430+4105/  (HST SLACS lens; used by plot
  guides under scripts/guides/plot/examples/)
- dataset/group/102021990_NEG650312660474055399/  (real catalog
  object; loaded by scripts/group/start_here.py)
- dataset/interferometer/uv_wavelengths/sma.fits  (synthetic SMA
  uv-coverage read by interferometer + multi/imaging_and_inter
  simulators and modeling guides)
- dataset/imaging/los_halos/los_halo_list.npy + los_sheet_values.npy
  (line-of-sight halo data read by los_halos/simulator.py)
- dataset/.gitignore  (the inner * + !.gitignore marker)

The dataset/ rule is rewritten as dataset/** plus !-pins for each
preserved path so the maintainer's intent is documented in
.gitignore. Note: the inner dataset/.gitignore (* + !.gitignore)
takes precedence for NEW files, so adding new real-data files in
the future requires `git add -f`. The !-pins are decorative for
new files but accurate for the currently-tracked set.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@Jammy2211 Jammy2211 merged commit b379719 into main May 8, 2026
5 checks passed
@Jammy2211 Jammy2211 deleted the chore/untrack-generated-datasets branch May 8, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant