Skip to content

feat: add galaxy_table CSV helper for scaling_relation examples #392

@Jammy2211

Description

@Jammy2211

Overview

The scaling_relation workspace examples shipped in autolens_workspace#141 hardcode their luminosity lists. For production users with 20+ companion galaxies this is unworkable — they want a CSV with y,x,luminosity,redshift? columns alongside the centre JSONs. This task adds a typed galaxy_table reader/writer to PyAutoGalaxy and migrates the four scaling_relation workspace scripts onto it.

Plan

  • Add autogalaxy/galaxy/galaxy_table.py with galaxy_table_from_csv / galaxy_table_to_csv wrapping autoconf.csvable. Returns a small GalaxyTable dataclass with centres: Grid2DIrregular, luminosities: list[float], redshifts: list[float] | None.
  • Re-export via autogalaxy/__init__.py (PyAutoLens picks it up transparently through its existing autogalaxy re-export chain).
  • Unit tests: round-trip, missing optional redshift column, partial-redshift rejection (mirrors autolens.point.dataset convention), extra columns ignored, row order preserved, empty CSV.
  • Update both scaling_relation simulators to emit CSVs alongside JSONs (additive — JSONs stay for backward compat).
  • Update both modeling scripts and modeling_for_luminosities.py to load luminosities from CSV.
Detailed implementation plan

Affected Repositories

  • PyAutoGalaxy (primary, library)
  • autolens_workspace (workspace consumer)

Work Classification

Both — library first via /start_library, workspace follow-up via /start_workspace.

Why PyAutoGalaxy

  • autoconf.csvable is intentionally generic row-I/O with no domain types and no autoarray dep — adding a galaxy schema there would push that dep down.
  • The autolens.point.dataset precedent (output_to_csv / list_from_csv for PointDataset) shows the pattern: typed CSV helpers live next to the typed object they describe. PointDataset is lensing-only, so the helper lives in autolens.
  • A galaxy-population table (centres + luminosities + redshifts) is NOT lensing-specific — it's a galaxy concept. The queued autogalaxy_extra_galaxies_audit.md follow-up wants the same helper. PyAutoGalaxy serves both autolens_workspace and autogalaxy_workspace from one home.

Branch Survey

Repository Current Branch Dirty?
PyAutoGalaxy main clean
autolens_workspace main dirty (formatter reflows from prior task — re-stash before workspace worktree)

Suggested branch: feature/scaling-relation-csv-loader
Worktree root: ~/Code/PyAutoLabs-wt/scaling-relation-csv-loader/

Implementation Steps

Library phase (PyAutoGalaxy)

  1. autogalaxy/galaxy/galaxy_table.py (new)

    • @dataclass GalaxyTable with centres: Grid2DIrregular, luminosities: List[float], redshifts: Optional[List[float]] = None
    • galaxy_table_from_csv(file_path) -> GalaxyTable — wraps autoconf.csvable.list_from_csv; partial-redshift handling mirrors _group_redshift in autolens/point/dataset.py:291
    • galaxy_table_to_csv(centres, luminosities, file_path, redshifts=None) — wraps autoconf.csvable.output_to_csv
  2. autogalaxy/__init__.py — add 3 imports for GalaxyTable, galaxy_table_from_csv, galaxy_table_to_csv. Verify al.galaxy_table_from_csv resolves via the existing autogalaxy → autolens re-export chain; if not, add explicit re-export in PyAutoLens/autolens/__init__.py.

  3. test_autogalaxy/galaxy/test_galaxy_table.py (new) — 8 tests covering round-trip with/without redshift, missing column, partial column, extra columns ignored, row order preserved, empty CSV, mismatched-length writer rejection.

Workspace phase (autolens_workspace)

  1. scripts/imaging/features/scaling_relation/simulator.py — add al.galaxy_table_to_csv(...) calls writing extra_galaxies.csv and scaling_galaxies.csv alongside the existing centre JSONs.

  2. scripts/imaging/features/scaling_relation/modeling.py — replace hardcoded relational_extras_luminosity_list with al.galaxy_table_from_csv(...). Update prose.

  3. scripts/group/features/scaling_relation/simulator.py — same as (4) for the group dataset.

  4. scripts/group/features/scaling_relation/modeling.py — same as (5) for scaling_galaxies_luminosity_list.

  5. scripts/group/features/scaling_relation/modeling_for_luminosities.py — switch the output from scaling_galaxies_luminosities.json to scaling_galaxies.csv using al.galaxy_table_to_csv, since CSV is the format the downstream modeling.py now consumes.

Key Files

  • PyAutoGalaxy/autogalaxy/galaxy/galaxy_table.py — new module
  • PyAutoGalaxy/autogalaxy/__init__.py — add 3 exports
  • PyAutoGalaxy/test_autogalaxy/galaxy/test_galaxy_table.py — new tests
  • autolens_workspace/scripts/imaging/features/scaling_relation/{simulator,modeling}.py — CSV emit + load
  • autolens_workspace/scripts/group/features/scaling_relation/{simulator,modeling,modeling_for_luminosities}.py — CSV emit + load

Reference reads (no edits)

  • PyAutoLens/autolens/point/dataset.py — typed-CSV-helper precedent
  • PyAutoConf/autoconf/csvable.py — generic CSV I/O wrapped by the new helper

Out of scope

  • autoconf / autoarray / autolens source changes (PyAutoLens auto-picks-up via autogalaxy chain; only add explicit re-export if discovered necessary during implementation)
  • Sharing infrastructure with autolens.point.dataset — the schemas differ enough that coupling would be premature
  • CSV migration for other workspace features (extra_galaxies, group/modeling.py, etc.)
  • autogalaxy_workspace changes — the queued autogalaxy_extra_galaxies_audit.md follow-up will consume the new helper as its own task
  • Notebook regeneration — separate /generate_and_merge run after merge

Original Prompt

Click to expand starting prompt

Extend scaling_relation examples with CSV loading via autoconf.csvable

Background

The two scaling_relation examples shipped by feature/scaling-relation-update (issue autolens_workspace#141) currently use hardcoded Python lists for the extra-galaxy / scaling-galaxy luminosities:

extra_galaxies_luminosity_list = [0.9, 0.9]
scaling_galaxies_luminosity_list = [0.45, 0.45]

The centres are loaded from JSON files written by the simulators (extra_galaxies_centres.json, scaling_galaxies_centres.json), but the luminosities are not. That is fine for a tutorial of fixed length, but production users with 20+ galaxies want a single file describing the full population.

autoconf already provides a generic CSV reader/writer at autoconf/csvable.py:

  • output_to_csv(rows, file_path, headers=None) — write list-of-dicts or list-of-sequences
  • list_from_csv(file_path) — read as ordered list-of-dicts

There is no autolens/autogalaxy-specific schema layer yet — autolens.point.dataset does this for its own format. We probably want a similar (very thin) schema layer for galaxy populations.

Goal

Both imaging/features/scaling_relation/modeling.py and group/features/scaling_relation/modeling.py should be able to drop a CSV like this in the dataset folder:

y,x,luminosity,redshift
3.5,2.5,0.9,0.5
-4.4,-5.0,0.9,0.5

…and load centres + luminosities (+ optional redshifts) in one call. The hardcoded lists in the modeling scripts stay as a fallback path for the tutorial flow but the CSV path is shown alongside.

Proposed work

  1. Decide where the schema layer lives. Two options:

    • (a) Extend autoconf/csvable.py with a thin typed reader (e.g. galaxy_table_from_csv) returning (centres: Grid2DIrregular, luminosities: list[float], redshifts: list[float] | None). Pro: every workspace gets it. Con: autoconf does not depend on autoarray, so the Grid2DIrregular return type would push it down a dependency edge.
    • (b) Keep the schema layer in autolens (or autoarray). Add e.g. al.util.galaxy_table_from_csv that wraps autoconf.csvable.list_from_csv and produces the typed outputs. Probably the right call.
  2. Extend both simulators (scripts/imaging/features/scaling_relation/... borrows dataset/group/simple, so update scripts/group/simulator.py; the new scripts/group/features/scaling_relation/simulator.py writes its own scaling_relation dataset) to also write extra_galaxies.csv and/or scaling_galaxies.csv next to the centre JSONs. Keep the JSON files for backward compatibility — the CSV is additive.

  3. Update both modeling scripts to load from the CSV instead of (or alongside) the JSON + hardcoded list. Show the CSV path as the primary modern flow; document the JSON-fallback inline for users on older datasets.

  4. Tests. test_autoconf/test_csvable.py already covers the generic round-trip. Add tests for the new schema layer (in whichever repo owns it), exercising:

    • missing optional column (redshift absent → None)
    • extra columns (silently ignored)
    • row order preserved
    • empty file → empty population

Out of scope

  • Refactoring the autolens.point.dataset CSV layer to use this new helper. That should be a separate task — point.dataset has its own column conventions and we should not couple them.
  • Migrating other workspace features to CSV. This task is scoped to the two scaling_relation examples plus their simulators.

Reference reads

  • autoconf/csvable.py — generic CSV I/O
  • autolens.point.dataset — example of an existing CSV schema layer
  • autolens_workspace/scripts/imaging/features/scaling_relation/modeling.py — current consumer (post issue Feature/adapt via fit #141)
  • autolens_workspace/scripts/group/features/scaling_relation/{simulator,modeling}.py — current consumer (post issue Feature/adapt via fit #141)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions