feat: CSV-first model API for cluster scripts

## Overview

Make CSV the first-class API for cluster lens modeling end to end. Today only the scaling-tier members live in a CSV (`scaling_galaxies.csv`); main galaxies, the host halo, and source profiles are still composed inline in Python. We'll extend the CSV layer to cover every cluster component, add a new `scripts/cluster/csv_api.py` guide that round-trips a full cluster model through CSVs, and refactor the existing 3 cluster scripts so the canonical workflow is "edit a CSV, re-run". The CSV layer becomes the *definition* of a cluster lens model.

## Plan

- **Library helpers (PyAutoGalaxy):** extend `autogalaxy/galaxy/galaxy_table.py` (or sibling module) with readers/writers for named-galaxy CSVs keyed by a `galaxy` column and a `profile_class` column. One CSV per profile family — `mass`, `light`, `point`. The reader returns a typed result the workspace can feed into `af.Collection(...)`; the writer takes a list of `(galaxy_name, profile)` and emits the family CSV. Profile-class lookup goes through `al.mp.*` / `al.lp.*` / `al.ps.*` via `getattr`. Sparse columns supported (different profile classes use different parameter columns; empty cells are tolerated).
- **`scripts/cluster/csv_api.py`** — new canonical workspace guide. Builds a small cluster model end-to-end in plain Python, writes it out to the family of CSVs, loads it back from those CSVs, prints the round-trip diff so every column has a visible Python counterpart. Also documents `point_datasets.csv` (existing) in the same place — convention is "load dataset before modeling", so the CSV story is presented in that order.
- **`simulator.py`** — load the truth model from the CSVs `csv_api.py` writes (rather than hardcoded inline). The simulator then writes `data.fits` + `point_datasets.csv` + copies of the model CSVs into the dataset folder so downstream scripts pick them up. This becomes the auto-simulate chain.
- **`modeling.py`** — refactor model composition to happen entirely from the model CSVs (main mass, main light, host halo, source light, source point, scaling tier). The inline `af.Model(...)` composition is shown as the alternative for users who prefer it, kept short.
- **`start_here.py`** — mirror `modeling.py` from CSVs, keeping the intro-script tone (Beta-feature warning, JAX speed pitch, Google Colab setup, simpler prose).
- **Workspace audit** — scan `autolens_workspace` for other CSV usage to confirm we're not breaking anything else; per the prompt the only existing case is `scaling_galaxies.csv`, verify.
- **Schema decisions to lock during implementation:**
  - Column convention for tuple parameters (`ell_comps`, etc.) — `name_y` / `name_x` vs `name_0` / `name_1`.
  - Whether the host halo (`NFWMCRLudlowSph` with `redshift_object` / `redshift_source`) goes in `main_lens_mass.csv` or its own `host_halo.csv` — likely the latter since its parameter set is fundamentally different from cluster-member dPIE mass.
  - Whether scaling-tier members keep the existing 3-column `y, x, luminosity` schema or migrate to the new `galaxy, profile_class, y, x, luminosity, ...` schema. Probably **keep existing** — the scaling tier is implicitly one profile class per member and naming each scaling galaxy is more overhead than signal.

<details>
<summary>Detailed implementation plan</summary>

### Affected Repositories
- PyAutoGalaxy (library — helpers + tests)
- autolens_workspace (workspace — new guide + refactor of 3 cluster scripts)

### Work Classification
Both — library ships first (PyAutoGalaxy PR), workspace follows after the library lands.

### Branch Survey
| Repository | Current Branch | Dirty? |
|---|---|---|
| PyAutoGalaxy | main | clean |
| autolens_workspace | main | clean (dataset/* artifacts ignored) |

**Suggested branch:** `feature/cluster-csv-api`
**Worktree root:** `~/Code/PyAutoLabs-wt/cluster-csv-api/`

### Implementation Steps

1. **PyAutoGalaxy: library helpers.** In `autogalaxy/galaxy/galaxy_table.py` (or split into a new `galaxy_csv.py` sibling), add:
   - `galaxy_models_to_csv(galaxy_profile_pairs, file_path, profile_family)` — writes one CSV per family. Each row carries `galaxy, profile_class, <param columns…>, redshift?`.
   - `galaxy_models_from_csv(file_path)` — returns a typed `GalaxyModelTable` carrying the rows grouped by `galaxy` name with `profile_class` dispatched to the right `al.mp` / `al.lp` / `al.ps` class.
   - Helper for stitching multiple family CSVs into an `af.Collection` (or list of `af.Model(Galaxy)`) keyed by the shared `galaxy` column.
   - Sparse-column tolerance: rows in the same CSV can use different profile classes with non-overlapping parameter columns; the reader only consumes the columns the row's class needs.
2. **PyAutoGalaxy: tests.** Round-trip unit tests in `test_autogalaxy/galaxy/test_galaxy_table.py` (or `test_galaxy_csv.py`) covering: single-profile-class round-trip, multi-class sparse-column round-trip, missing-column rejection, name-join across families.
3. **autolens_workspace: `scripts/cluster/csv_api.py`** — new guide demonstrating the round-trip with prose for every section.
4. **autolens_workspace: simulator/modeling/start_here refactor** — load model CSVs via the new library helpers; simulator emits both the model CSVs and the derived `point_datasets.csv` into the dataset folder.
5. **Auto-sim guard** — modeling/start_here check both `data.fits` and the model CSVs; trigger simulator if missing.
6. **Workspace audit** — `grep -rn '\.csv' autolens_workspace/scripts/` to confirm we haven't missed another CSV consumer.

### Key Files
- `PyAutoGalaxy/autogalaxy/galaxy/galaxy_table.py` (or new sibling) — library helpers.
- `PyAutoGalaxy/test_autogalaxy/galaxy/test_galaxy_table.py` — round-trip tests.
- `PyAutoLens/autolens/__init__.py` — re-export new helpers under `al.*`.
- `autolens_workspace/scripts/cluster/csv_api.py` — new guide.
- `autolens_workspace/scripts/cluster/simulator.py` — load truth from CSVs.
- `autolens_workspace/scripts/cluster/modeling.py` — compose model from CSVs.
- `autolens_workspace/scripts/cluster/start_here.py` — compose model from CSVs.

### Open Design Questions (to resolve during implementation)
- Tuple parameter column convention (`name_y` / `name_x` vs `name_0` / `name_1`).
- Whether `NFWMCRLudlowSph` host halo gets its own CSV.
- Whether scaling-tier CSV format migrates or stays.
- How to encode priors vs truth values in CSV cells when the simulator and modeling scripts share the same family CSV (the simulator wants concrete floats; modeling wants `UniformPrior` / fixed values — likely the CSV carries only concrete values, and modeling promotes selected columns to priors at compose time).

</details>

## Original Prompt

<details>
<summary>Click to expand starting prompt</summary>

For autolens_workspace/scripts/cluster/simulator.py, can we make it so that all parameters are in .csv form
and loaded from there, to establish that the base way to interact with the autolens API for clusters
is via csv.

I thin kthe way to make this work is to write a guide, autolens_workspace/scripts/cluster/csv_api.py,
which illustrates how to set up lens models using the normal autolens API and then output them to csv,
and showing how all those featres linked together.

This csv file can then act as an "Auto Simulate" type siutation for simulator.py, which loads the csv outputsof this file.
The simulator will put the csv files int he lens it simulates at the end, meaning other scripts only need the
auto simulate performed here.

Things I am still unclear on that this guide could help with are:

Shouild main galaxies, extra galaxies and scaling galaies use their own csv files or can they all be combined into one?
I think it would be good if they could all be combined into one, but hiswould mean the .csv needs to know a lot more
than just parameters, but feasible mass profile class, lens name (e.g. when its used to name light and mass profiles in the Galaxy), redshift and
others. I think I like the idea of a single .csv API being used for all cluster interfaces.

The flip side is this could get complicated because if the same galaxy has light and mass profiles then the notion of column
heads breaks down, so maybe the rule is "one csv file per light or mass profile", and the reuse of light profile names, mass profile
names and galaxy names is exploited when building the model? Most cluster models will apply the same thing over loads
of galaxies so I think that works, so we can just build it in an extensible way.

This would mean it also needs the galaxy names, even though in simulator.py galaxies are not named when used in a Tracer
these names would be used for performing model composition, again the csv_api.py script could explain and cover this.

I would then go so far as to make it so that this guide also explains point_datasets.csv, which functionally looks a lot
more complete to me and just needs an explanation. I would explain this before doing galaxy API, as convention is normally load
dataset before modeling. Note that point_datasets.csv itself is made by simulator.py, thus I think csv_api.py
can just make an example one which is not paired to the model in the guide making it clear its for illustrative purposes
but that simultor.py makes the actual one.

There are also no .csv's used at all for defining the point source model, which obviously need to be paired
with the point dataset.

Source-side composition (the `SersicCore` light + `Point` model loops in both simulator.py and the JAX registration mirror) should also be migrated to two CSV files (source_point_models.csv and source_light_models.csv), making the whole cluster experience a first-class csv experience.

I guess at this point users should easily not just load .csv's into models but also be able to print the csv
contents in python / Notebook cells and have clear print statements of the loaded objects showing how their
csv load parameters link to autolens objects, I guess the csv_api does that.

Finally, do a quick scan through autolens_workspace of other csv uses but I think at the moment its just scaling galaxies
which are already implemented well.

Do deep research when coming to this csv API and once you're happy with it make csv interface the version used on all 3 cluster
scripts that exist. This is a huge issue -- the csv interface defines cluster modeling throughout, so dont be afraid
to ask hard questions about balancing the need for modeling large amoiunts of gaalxies to making a user friendly API.
Feel free to ask if some of this work would benefit going into the source code more so than it alredy has.

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CSV-first model API for cluster scripts #187

Overview

Plan

Affected Repositories

Work Classification

Branch Survey

Implementation Steps

Key Files

Open Design Questions (to resolve during implementation)

Original Prompt

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Repository	Current Branch	Dirty?
PyAutoGalaxy	main	clean
autolens_workspace	main	clean (dataset/* artifacts ignored)

feat: CSV-first model API for cluster scripts #187

Description

Overview

Plan

Affected Repositories

Work Classification

Branch Survey

Implementation Steps

Key Files

Open Design Questions (to resolve during implementation)

Original Prompt

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions