Skip to content

Switch to OEDI Format#75

Open
rajeee wants to merge 132 commits into
mainfrom
new_sampling_fixed
Open

Switch to OEDI Format#75
rajeee wants to merge 132 commits into
mainfrom
new_sampling_fixed

Conversation

@rajeee
Copy link
Copy Markdown
Collaborator

@rajeee rajeee commented May 4, 2026

Summary

Permanently drops support for the legacy 3-table BSB schema (separate baseline / upgrades / timeseries tables) in favor of the OEDI 2-table format (annual_and_metadata + timeseries). This is the dominant theme of the PR and a hard breaking change — there is no fallback path for the old shape. All query-generation, snapshot fixtures, integrity checks, and tests now assume the OEDI shape.

Version bump: 0.3.00.4.0.

Schema pivot (the breaking change)

  • New 2-table model: md_table (annual + metadata, replaces baseline-and-upgrades) and ts_table (timeseries). The bs_table / up_table handles are now thin SA aliases over md_table, kept where two distinguishable handles are needed for self-joins.
  • Renames at the QueryCore level: bs_table/up_tablemd_table, bs_key/up_keymd_key, bs_bldgid_columnmd_bldgid_column, _bs_completed_status_col/_up_completed_status_col_md_completed_status_col, _up_upgrade_col_md_upgrade_col, _restrict_targets_bs_restrict_targets_md, plus matching helpers.
  • Constructor signature changes: the legacy 3-tuple (db, baseline, upgrades) form is gone — table_name is now str | tuple[str, Optional[str]].
  • TS upgrade-pair queries rewritten to a single-scan MAX(CASE WHEN upgrade=…) + GROUP BY shape (replaces the previous baseline/upgrade self-join), with a CTE → subquery fallback for SQLAlchemy 2.0 literal-bind expansion.
  • upgrade=0 filter now threaded through baseline-side joins; baseline rows filtered out of upgrade-side report queries.
  • Multi-column / composite unique-key support for OEDI tables (e.g. ComStock tract-denormalized metadata): unique_keys schema, multi-column restrict keys, and full unique-key tuples returned from building-id helpers.

Performance work enabled by the pivot

  • TS aggregation unified: pre-bucket the inner GROUP BY, defer pure-bs enduses to the outer SELECT (constant per building, no need to materialize per-15-min and re-aggregate).
  • Pre-aggregate bs to building grain before TS join to eliminate tract fan-out on ComStock.
  • Fold join_list joins into the bs_per_bldg subquery; fall back to direct ts ⋈ bs JOIN when join_list is present.
  • Skip inner timestamp materialization for year-collapse TS queries.
  • Push user bs_restrict into the inner ts ⋈ bs JOIN condition so Athena predicate-pushes into the metadata scan (minutes vs. timeouts on ComStock).

Test infrastructure rebuild

  • Replaced legacy test_BuildStockQuery.py / test_BuildStockSavings.py / test_query.py with snapshot-driven test_query_snapshots.py and a 2102-line test_invariants.py that cross-checks independent code paths (annual vs ts_year_collapse vs sum(ts_monthly), savings additivity, applied_in semantics, etc.).
  • Content-addressed disk cache replaces the old pickle cache; SQL is normalized at compile time; cost-history lookup + cost-regression gate guard against runaway Athena spend.
  • Snapshot fixtures regenerated for the OEDI shape across aggregates_and_savings, restrict_avoid, savings, timeseries, utility, and helper-method snapshots.
  • Example notebooks auto-generated from snapshot entries and executed via nbclient with safety gating.

Packaging + tooling (rides along)

  • Migrated from setup.py to PEP 621 pyproject.toml with hatchling as the build backend.
  • Dev deps moved to PEP 735 [dependency-groups] (install via uv sync --group dev); [project.optional-dependencies.full] retained for consumers.
  • Python floor raised to 3.12 (requires-python = ">=3.12").
  • uv.lock committed for reproducible dev/CI environments; CI switched to uv and to a 3.12/3.13/3.14 matrix.
  • Lint: kept flake8 (ruff was tried and reverted for now); --update-snapshot semantics tightened so snapshots can't paper over real numeric divergence.

Bug fixes uncovered along the way

  • aggregate_ts_by_eiaid was binding the eiaid column to md_table instead of bs_table after the collapse.
  • Several comma-join bugs in check_ts_bs_integrity, get_distinct_count, and select-from-md_table call sites (DUPLICATE_COLUMN_NAME on Athena).
  • applied_only filter on TS query paths was dropping rows it shouldn't have (validated by an xfail invariant flipping green).
  • get_upgrades_csv cross-join, get_upgrade_names malformed SQL, get_successful_simulation_count crash, viz_data monthly-results query path, and several composite-key handling regressions.
  • _restrict_targets_md now recognizes bare unprefixed column names.
  • TS pivot crash on calculated columns; get_quartiles correctly disallowed on TS queries.

Test plan

  • uv sync --group dev resolves cleanly on Python 3.12 / 3.13 / 3.14
  • pytest -vv passes (snapshot tests + invariants)
  • flake8 buildstock_query is clean
  • Smoke-test BuildStockQuery against a real OEDI ResStock + ComStock workspace (annual, savings, TS, multi-state, composite-key)
  • Run example notebooks end-to-end against the cached snapshot fixtures

Migration notes for consumers

  • Anyone passing table_name=(db, baseline, upgrades) must switch to a single OEDI annual_and_metadata table name (or (db, md, ts)).
  • Anyone reading bsq.bs_table / bsq.up_table / bsq.bs_key will continue to work via the alias shims, but the canonical attribute is now bsq.md_table / bsq.md_key.
  • The 3-table BSB workspace shape is not supported on 0.4.0 — pin to 0.3.x if you still need it.

rajeee and others added 30 commits April 22, 2026 21:01
New [unique_keys] TOML block with metadata and timeseries key lists
(timeseries must be a subset of metadata). Three new schemas ship with
it: comstock_oedi_state_and_county, resstock_oedi_new, resstock_oedi_vu.

bs_key, up_key, ts_key tuples (plus *_key_cols properties) expose the
configured unique-key identity of each table. Aggregate sample counts,
load-duration grouping, and timesteps verification now use these tuples
via _count_distinct() and multi-column group-by. Single-key schemas
still emit byte-identical SQL.

Also carries prior session fixes: CACHED sentinel skipped in batch
helpers, cost log precision bumped to 2 decimals, and _get_rows_per_building
groups on the configured timeseries keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_get_applied_in_subquery now projects all metadata key columns, and
_add_applied_in_restrict emits a tuple-IN filter against the correct
side (ts_key when timeseries-backed, bs_key otherwise). Single-key
schemas still emit the original single-column IN form.

_get_restrict_clauses and _normalize_restrict_subquery extended to
accept a tuple of columns paired with a multi-width subquery, using
sa.tuple_(c1,...).in_(subquery) under the hood.

Also fix an outdented return in _get_buildings_by_change that broke
the module at import time under strict linting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
get_building_ids and get_buildings_by_locations now return DataFrames
projecting all bs_key columns instead of just bldg_id. _get_upgrade_buildings
and get_buildings_by_change return list[int] for single-key schemas and
list[tuple] for multi-key. _get_full_options_report's array_agg uses
row(...) packing when the upgrade key has more than one column.

Single-bldg_id default paths keep byte-identical SQL and return shapes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
get_results_csv, get_results_csv_full, get_upgrades_csv, and
get_upgrades_csv_full now return flat DataFrames. Parquet readbacks
that still carry a bldg_id index are reset so the returned shape is
consistent regardless of source.

Breaking change: callers that relied on df.loc[bldg_id] must now do
df[df.bldg_id == bldg_id] or set the index themselves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Old check compared row counts between timeseries and baseline/upgrade
tables, which produced false positives whenever a bldg_id could
legitimately appear in baseline more than once (e.g. per state).

New flow:
  1. Distinct-bldg parity per upgrade between timeseries and baseline/upgrade
     tables, using bs_key and ts_key for distinct counting. A dedicated
     _get_metadata_distinct_bldg_count replaces the row-counting derivation
     from get_success_report.
  2. Duplicate detection per table: group by bs_key on baseline, up_key +
     upgrade on the upgrade table, and ts_key + timestamp (+ upgrade when
     present) on the timeseries table. First 5 offending rows are printed.
  3. Existing per-building row-count verification retained, now phrased in
     terms of partitions rather than buildings.

_get_ts_report now counts distinct ts_key tuples instead of distinct bldg_id.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Near-duplicate of _get_rows_per_building with no callers in the
package. Drop it rather than maintain two versions through the
multi-key refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- bs_key / up_key / ts_key reflect configured unique_keys and revert
  to (bldg_id,) when cleared.
- applied_in emits tuple-IN for multi-key schemas and scalar-IN for
  single-key.
- aggregate_timeseries with timestamp_grouping_func emits
  count(DISTINCT (c1, c2)) for multi-key and count(distinct(c1)) for
  single-key.
- get_building_ids projects all bs_key columns.

Also switch _count_distinct to sa.distinct(sa.tuple_(*cols)) so the
multi-column form renders as the standard SQL `count(DISTINCT (c1, c2))`
rather than `count(distinct(c1, c2))`, which Trino parses as a
single-arg function call on a row tuple.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The branch earlier swapped flake8 for ruff with an aggressive ruleset
(E, W, F, I, UP, B) that flagged ~280 pre-existing UP/B violations
unrelated to unique_keys work. Revert to flake8 with the same config
used on main (.flake8, max-line-length = 120) so this PR stays focused
on multi-key unique-keys changes.

Also wrap two 137-char lines in _get_metadata_distinct_bldg_count so
flake8 passes cleanly.
UNLOAD writes query results as multiple parquet files, and row order
across chunks is not guaranteed when read back. test_savings_shape_with_timestamp_grouping
was intermittently failing because the time column started at different
points depending on which parquet chunk pandas read first.

Sort df1 and df2 by (geometry_building_type_recs, time) before comparing
so the assertion is order-independent.
Set the pandas index back on the four CSV-returning functions to
preserve backward compatibility with callers that rely on
df.loc[bldg_id]. For single-key schemas the index is a single-level
Index on bldg_id (byte-identical to main). For multi-key schemas it
becomes a pandas MultiIndex on the configured bs_key / up_key tuple.

get_results_csv, get_results_csv_full, get_upgrades_csv, and
get_upgrades_csv_full all use `set_index(list(self.bs_key))` (or up_key).
Full variants reset any existing index from the parquet before re-
indexing so the result is consistent regardless of how the parquet
was written.
Fixes ValueError when querying timeseries data against schemas whose
timeseries unique key spans multiple columns (e.g. bldg_id + state).
_add_applied_in_restrict now threads a width-matched subquery to the
correct side, _split_restrict routes tuple column refs to the ts side,
and restrict/avoid accept (col_tuple, subquery | list-of-tuples).
bs_tbl in this branch is a ts-side subquery whose columns match the
timeseries unique key, not the metadata key. Looking up bs_key on it
raises KeyError for schemas whose metadata key has cols absent from ts
(e.g. in.nhgis_tract_gisjoin). Surfaced by a monthly TS query against
the comstock_oedi_state_and_county schema.
Clears the deprecated query surface ahead of the 2.0 release so the new
bsq.query(...) facade is the single public path for annual, timeseries,
and savings aggregations. Internal callers (utility_query, viz_data) are
migrated, savings_query is deleted outright, and the savings_shape-compat
branches in aggregate_query._query collapse to single canonical paths.

Also drops TSQuery.split_enduses (application code should batch) and
TSQuery.collapse_ts (use timestamp_grouping_func="year"); SavingsQuery
schema is removed.
The else-branch in init_monthly_results referenced run_obj._conn, which
does not exist on BuildStockQuery, and used run_obj.params instead of
run_obj.run_params for the S3 unload bucket. Both were latent bugs that
only fired on cache miss; the prior aggregate_timeseries SQL happened to
match cached entries, so the branch never ran in CI.

Replace the hand-rolled unload/cache dance with a direct run_obj.query()
call, which routes through execute() and handles caching, S3 result
lookup, and UNLOAD submission consistently with the rest of the library.

Note: test_Viz fixtures may need their reference pkl cache regenerated
against the new SQL text; that is a separate fixture-maintenance task.
When a timeseries query asks only for upgrade output (no savings, no
baseline) with applied_only=True, the ts_b/ts_u subquery pairing adds
unnecessary cost and, more importantly, returns zero rows for runs whose
timeseries table lacks upgrade=0 rows (upgrade-only runs). The column
expressions for applied_only reference only up_tbl, so we can aggregate
ts_u directly with the same shape the upgrade_id="0" branch already uses.

Also refresh the viz test reference caches with entries keyed off the new
SQL text.
Relocate the pre-snapshot test modules (test_BuildStockQuery.py,
test_BuildStockUtility.py, test_query.py, test_Viz.py, and their helpers)
into tests/legacy/ so pytest stops collecting them. The new snapshot-driven
framework replaces them and the old files are kept for reference.
Introduce a test framework that pins BuildStockQuery.query() generated SQL
and reference result data per curated argument set, plus cross-query
invariant checks that verify internal consistency without hitting Athena.

Framework (tests/test_utility.py, tests/conftest.py,
tests/test_query_snapshots.py):
  - Each snapshot entry in tests/query_snapshots/{schema}/*.json carries
    one or more equivalent arg-variants, a .sql file (exact generated
    SQL), and a .parquet file (reference DataFrame).
  - Normalized-whitespace SQL comparison with sqlglot fallback; variant
    divergence surfaces as per-variant failure blocks.
  - Pytest options --check-data, --update-sql, --update-sql-and-data
    control data comparison and (re)generation of reference files.
  - Session-scoped fixtures for comstock_oedi_state_and_county and
    resstock_oedi_vu schemas against buildstock_sdr on rescore.

Invariants (tests/test_invariants.py):
  - annual == ts-year-collapse == sum(ts-monthly) for enduse totals.
  - sample_count / units_count consistency across query flavors (resstock
    only; comstock counts intentionally skipped pending a library fix).
  - baseline - upgrade ~= savings within a single include_savings query.

Seeded with ~30 entries across annual/timeseries/savings/applied_only/
restrict_avoid/mapped_column for both schemas, CO-restricted and limited
to electricity/natural-gas totals to keep Athena usage small.

See tests/query_snapshots/README.md for the workflow.
_add_restrict already accepts annual_only to route restrict columns
through the baseline vs timeseries table, but the caller in
BuildStockAggregate._query was never passing it, while the adjacent
_add_avoid call was. Bring restrict into line with avoid so annual-only
queries consistently resolve restrict columns against the baseline
table.
On schemas where one TS row maps to multiple metadata rows (e.g.
comstock_oedi_state_and_county, which has one metadata row per
(bldg_id, nhgis_tract_gisjoin, state) but one TS row per (bldg_id,
state)), the TS-grouping branch of BuildStockAggregate._query was
returning sample_count values that undercounted by the per-TS-key
metadata multiplicity — empirically 1.4x-100x across building types.

Count distinct over the full metadata key on the baseline table
(already present in tbljoin) instead of over the TS key on the TS
subquery. For schemas where bs_key == ts_key (resstock_oedi_vu etc.)
the resulting count value is identical; SQL differs because columns
are now referenced via bs_table.

Comstock TS-grouping snapshot data needs regeneration after this
change:

    pytest -s -v tests/test_query_snapshots.py --update-sql-and-data

Re-enable the sample_count/units_count check in the comstock invariant
test once snapshots are refreshed.
The library bug that made these undercount on comstock TS-grouping queries
was fixed in 25fa3fa. Restore the active count-column comparisons to guard
against regression. Now 4/4 invariants pass including the comstock counts.
Only _get_gcol (the group_by path) used to fall back to the prefixed
column name when a bare name didn't resolve. Restrict, avoid, and the
TS side of _split_restrict went straight to _get_column and raised
ValueError on bare names like "state" / "vintage" whenever the
underlying table only exposed the prefixed form ("in.state" etc).

Move the fallback into QueryCore._get_column itself so every caller
that resolves a user-provided string column gets the same behavior.
Simplify _get_gcol to drop its now-redundant except block.

Also trim trailing-newline inconsistency in
comstock_oedi_state_and_county.toml.
Now that _get_column auto-prepends the characteristics prefix, the
snapshot JSONs no longer need per-schema hand-prefixing for restrict
and avoid. Simplify to a uniform convention:

  - group_by: write "in.<col>" explicitly (readable, consistent across
    schemas).
  - restrict / avoid: write bare "state" / "vintage" — the library's
    auto-prefix fallback resolves to "in.state" / "in.vintage" when
    that's what exists on the table.

Comstock group_by entries gain their "in." prefix; resstock
restrict/avoid entries drop theirs. Stored SQL is unchanged because
the resolved SQLAlchemy column objects are the same.
Now that QueryCore._get_column auto-prepends the characteristics prefix
for both group_by and restrict/avoid, snapshot JSON entries can rely on
bare column names everywhere. Stored .sql files are unchanged — the
fallback resolves bare "state" / "geometry_building_type_recs" /
"comstock_building_type" / "vintage" to the same SQLAlchemy column
objects that were previously written out explicitly.
Collapse four near-duplicate test functions into two parametrized ones:

  test_annual_equals_ts_year_equals_ts_monthly_sum[resstock_...]
  test_annual_equals_ts_year_equals_ts_monthly_sum[comstock_...]
  test_savings_decomposition[resstock_...]
  test_savings_decomposition[comstock_...]

Each case provides the fixture name, schema directory, per-leg enduse
lists (comstock baseline uses "..kwh" suffix, TS does not), and the
group-by column. The bsq fixture is resolved at test time via
request.getfixturevalue so the case table stays declarative.

Net: -48 lines, same 4 cases, no behavior change (15/15 pass).
Snapshot layout
- Collapse per-schema JSON files into one shared file per flavor at
  tests/query_snapshots/<flavor>.json. Per-schema differences in column
  names (electricity, gas, building_type, vintage, mapping_dict, etc.)
  go through $PLACEHOLDER substitution at load time, with leg-aware
  resolution for comstock's `..kwh` annual suffix.
- Move artifacts to flat <schema>_sql/ and <schema>_data/ directories;
  drop the per-schema _CO filename suffix.

Snapshot harness contract
- Rename --update-sql/--update-sql-and-data to --update-snapshot/--overwrite-snapshot.
  --update-snapshot writes only refactor-safe diffs: cosmetic SQL drift,
  or real drift where the data still matches/missing/equivalent. Real
  data divergence requires --overwrite-snapshot.
- Add `data_status='equivalent'` for shape-shifted-but-value-matching
  data (e.g. column added/removed without changing shared values).
- Tolerate approx_percentile noise on array-valued columns (quartiles)
  via per-row scale-aware tolerance.
- Add `_method` dispatch so snapshot entries can target methods other
  than `query()` (e.g. `get_building_ids`).

SQL generation cleanup
- Strip CAST(col AS VARCHAR) from generated SQL. Add `typed_literal`
  helper that coerces comparison values to match column SQL type at
  the call site (int for upgrade columns, bool for applicability).
  Trino can now use parquet stripe statistics for predicate pushdown.
- Push user bs_restrict (e.g. state='CO') into the inner ts ⋈ bs JOIN
  ON condition so comstock's tract-denormalized metadata is filtered
  before the timeseries join.
- _split_restrict applies columns living on both bs and ts (state,
  upgrade) to both sides, not just ts.
- Raise UnsupportedQueryShape from __get_timeseries_bs_up_table's
  upgrade-pair branch on comstock; harness treats this as skip.

API additions
- get_building_ids(applied_in=...) returns building keys filtered to
  those that applied to all listed upgrades. Uses tuple-IN for
  comstock's multi-column unique key.

Invariant tests
- 8 new equality invariants exercising existing snapshot data plus
  one auxiliary building_ids snapshot trio: group_by sum = overall,
  CO subset of CO+WY, avoid + avoided = full, MappedColumn aggregates,
  15-min raw sums to monthly, savings_only column equality, two-fuel
  electricity = single-fuel, applied_in intersection.
- xfail mark for the known applied_only divergence on the timeseries
  upgrade-pair flow (TS table includes inapplicable buildings; annual
  flow filters them via up.applicability=true, TS flow doesn't).

Test count: 39 passed, 1 skipped, 2 xfailed.
rajeee added 29 commits April 27, 2026 20:28
Previously the notebook used Path.cwd() with a 3-way fallback chain to
find the snapshot test cache directory. That worked from the repo root,
tests/, or tests/example_notebooks/, but could fail if Jupyter was
launched from a deeper or unrelated directory.

Use IPython's `_dh[0]` (the notebook's own directory at kernel startup)
to resolve the cache as a relative sibling: `../query_snapshots/<schema>_cache`.
Falls back to "." if `_dh` is missing (e.g. when the cell is run outside
Jupyter), preserving backward compatibility with non-Jupyter execution.

Verified end-to-end via `jupyter nbconvert --execute` on
annual_resstock_oedi.ipynb — 13 cells executed, all queries returned
cached results with $0.00 cost.
Generated notebooks now run through nbclient after writing so cells
contain real DataFrame outputs (rich text/html tables + text/plain)
instead of a pre-embedded preview. The notebook IS the runnable artifact
AND the source of truth for current behavior.

Two safety mechanisms keep regen cheap and bounded:

  1. `_NO_EXECUTE_METHODS` allowlist — `report.get_success_report`,
     `report.check_ts_bs_integrity`, and `agg.get_building_average_kws_at`
     are commented out in source. Each fans out internally to many
     Athena queries the snapshot cache doesn't cover (count-distinct
     per upgrade, per-pair change tracking, the 3.2 TB landmine
     documented in CLAUDE.md). Determined by reading method source —
     only true singleton-execute methods run live.

  2. `_has_unroundtrippable_placeholder` — auto-comments any cell whose
     args contain SA object reprs (`<sqlalchemy....Label at 0x...>`,
     `MappedColumn(bsq=<...object at 0x...>)`, or our own `<Class ...>`
     fallback). Those can't run as written; the user must construct
     the SA object inline.

`allow_errors=True` on the nbclient executor: a single bad cell embeds
its error in the notebook instead of aborting execution. This catches
pre-existing interactive-mode bugs in `bsq.get_results_csv` etc. that
the snapshot tests miss (they run with get_query_only=True, never
exercising the post-query DataFrame access).

End state: 121 cells executed cleanly, 8 commented (unsafe), 10
commented (placeholder), 6 cells with embedded errors documenting
real bugs to fix separately.
Removes unused imports (`os` in main.py and helpers.py, `sqltypes` in
main.py) and a stale `ucol` local in aggregate_query.py, plus a missing
blank line before a nested def and a 124-char comment line. CI Lint
step was failing on these six pyflakes/pycodestyle violations.
The bs_per_bldg subquery (introduced to collapse ComStock's tract fan-out
before joining ts_aggr) was wrapping bs-side group_by columns in
arbitrary() and grouping only on (bldg_id, state). For ComStock — whose
metadata is partitioned at (bldg_id, tract, state) granularity with
weight divided across tract rows — a building straddling counties had
its full weight attributed to whichever county arbitrary() picked,
silently dropping the tract-fractional disaggregation.

Same issue silently affected ResStock's utility_ts_by_eiaid: eiaid_weights
fans out per-county into multiple utilities, so arbitrary(eiaid) collapsed
the weighted apportionment.

Fix: project bs-side group_by columns as bare references and add their
underlying expressions to bs_per_bldg's GROUP BY. Pure-bs enduses
(sqft, vintage) and extra_bs_cols stay under arbitrary() — they're
true per-bldg constants.

Adds three ComStock-only timeseries snapshots (year-collapse: by_county,
by_tract, overall_co) plus an invariant cross-checking sum(by-county)
== sum(by-tract) == overall. Replaces utility_ts_by_eiaid's eiaid=4110
(no CO data → empty) with eiaid=15466 (Xcel CO, ~9.3k samples) so the
snapshot exercises the disaggregation path with real data.
SqlCache now appends each get/put hash to .cache_usage_log inside the
cache folder so we can tell which entries a test session actually
exercised. tests/cleanup_stale_caches.py removes cache files whose
hashes are referenced in neither the snapshot JSONs nor the usage log;
tests/refresh_caches.py wraps the clear → run-tests → cleanup flow.

Also includes the stale entries that the first cleanup run identified
and dropped, freeing the resstock_oedi_cache + comstock_oedi_cache
directories of orphans left over from previous SQL refactors.
Timestamps and a few result-data fields refreshed from the most recent
test session that exercised every notebook against the pruned cache.
Extends the test suite to cover the county-aggregated ComStock metadata
table (`_md_agg_by_state_and_county_parquet`) where buildings within a
county are pre-collapsed into one weight-correct row. The agg schema is
wired into every schema-aware list/dispatcher (fixtures, snapshot and
invariant parametrize lists, placeholder resolver, cleanup script,
notebook builder, backfill costs) and the snapshot harness's
`_patch_hash` now inserts a missing schema key in addition to overwriting
an existing one. `--update-snapshot` populated the new
`comstock_oedi_agg_cache/` with 76 sidecar triples covering 66 JSON
entries.
Walks every snapshot JSON entry across six weight-aggregating flavors
(annual, timeseries, savings, applied_only, restrict_avoid,
invariants_three_way), runs the query on both ComStock schemas, and
asserts equal DataFrames after dropping `sample_count` (the only column
that differs because the agg table has fewer rows). Filters out
nondeterministic entries and aggregations that don't survive county
pre-collapse (mean/max/min/quartiles/arbitrary).

A separate test handles `get_building_ids`: the two schemas return
different partition columns (tract vs county), so the comparison
projects to (bldg_id, state) and dedupes — the resulting building set
is identical across schemas. Both tests run from the snapshot caches in
under 6 seconds with no Athena spend.
Adds the three-way invariant on the applied_in=[1,2] code path that
combines flows the existing tests covered separately but never together:
annual ≡ ts-year-collapse ≡ sum(ts-monthly), now exercised on both the
categorical comstock_building_type axis and the county partition-key
axis whose arbitrary()-collapse silently broke before commit 182ff21.

Three new snapshot entries (annual / ts_year / ts_monthly grouped by
county, applied_only=true, applied_in=[1,2], CO) populate the cache for
both ComStock schemas, gated to comstock_oedi/comstock_oedi_agg via the
JSON `schemas` allowlist since resstock has no county column.
`Query.applied_in` only expressed the all-of-listed-upgrades intersection
semantic. Real callers want any/all/not combinations that don't compose
cleanly as more flags. Drop the field and add two helpers:

- `bsq.get_applied_buildings(any_of, all_of, get_query_only)` — returns
  buildings (DataFrame or SQL) matching the predicate. Snapshot-testable.
- `bsq.get_applied_buildings_filter(any_of, all_of)` — returns a
  `(cols, subquery)` tuple to drop into `restrict=[...]` or `avoid=[...]`.
  Empty lists return None so callers can pass it through unconditionally.

Both route through a shared private `_build_applied_subquery` that emits:
  - all_of only: `WHERE upgrade IN ... GROUP BY keys HAVING count = N`
    (byte-stable with the prior applied_in shape)
  - any_of only: `WHERE upgrade IN ... GROUP BY keys` (no HAVING)
  - both: AND'd CASE-filtered counts
0 (baseline) is rejected in either list — applicability is meaningful
only for upgrades.

`Query.applied_only` stays untouched. It preserves two annual-path
optimizations the WHERE-side helper can't: the bs⋈up INNER vs LEFT OUTER
JOIN switch, and the bare `up.col` projection vs the
`CASE WHEN applied THEN up.col ELSE bs.col END` form.

Internal callers that previously synthesized `applied_in=[upgrade_id]` on
the TS path (the load-bearing inapplicable-buildings filter at
aggregate_query.py:622-643) now build the same subquery directly via
`_build_applied_subquery(all_of=[upgrade_id], key_kind="timeseries")`.
Behavior is unchanged.

`_split_restrict` learned to handle md-side composite-key tuple restricts
— previously only the TS side did, since the only tuple-keyed restricts
came from the (TS-side) applied_in subquery.

Snapshot JSON entries that used `applied_in` are migrated to a new
`{"_applied_filter": {"all_of": [...]}}` sentinel that the dispatcher in
`tests/test_utility.py:_dispatch_method` resolves into a live
`get_applied_buildings_filter` call before invocation. Most migrated
entries produce byte-stable SQL when the filter is appended after the
existing restrict items (matching the old append order); the four TS
entries with explicit applied_in=[1,2] now produce a redundant double
applied filter (auto-synthesized for upgrade_id + user-supplied for [1,2])
which Trino merges at planning. Those four hashes are blanked with a note
— data check confirms equivalence; user runs --update-snapshot to backfill.

Also picks up the user's separate notebook-builder plan in this commit
since the changes intermixed in shared files: nbclient strict mode
(`allow_errors=True` removed, failure surfaces to pytest), shared
`expand_rate_map_flat` helper, and the `_was_wrapped` cache-poisoning
guard for list/dict/Series-returning methods.
Three real cell failures in tests/example_notebooks/utility_resstock_oedi.ipynb
were silently embedded in cell outputs because nbclient ran with
allow_errors=True (the strict-mode flip and the test_utility/test_query_snapshots
plumbing for that — wrap-marker, cache-poisoning guards, expand_rate_map_flat
helper — landed in commit a44bed7). This change finishes the loop:

- buildstock_query/utility_query.py: get_locations_by_eiaids now reads the
  result via res.iloc[:, 0] instead of res[map_eiaid_column]. Athena UNLOAD
  names projections positionally when the SELECT-DISTINCT column has no
  alias, so the parquet column is "value" rather than "gisjoin", and the
  named lookup raised KeyError on every live invocation.

- tests/diagnose_wrapped_cache.py: read-only diagnostic that walks every
  <hash>.parquet/.sql pair and reports any whose column count contradicts
  the SELECT-list count in the .sql sidecar. Surfaces the wrap-then-writeback
  cache poisoning whenever it crops up; runs in seconds.

- Deleted comstock_oedi_agg_cache/629ce88f...{parquet,sql,json}: orphaned
  by the applied_filter SQL refactor; the new SQL hash already has its own
  triple on disk.

- Re-executed utility_resstock_oedi.ipynb and report_comstock_oedi_agg.ipynb
  under the now-strict nbclient. All cells succeed.
…nts.json

The invariants suite was issuing many queries that no snapshot JSON
owned. Without ownership, those cache parquets had no SQL drift check
and couldn't be repopulated via --update-snapshot — and refresh_caches
was hard to reason about because hashes touched by invariants weren't
declared anywhere.

Solution: each invariant call site adds a sibling record_query(bsq,
args) line. The recorder writes a JSONL log per session;
normalize_invariant_snapshot.py merges into from_invariants.json with
blank sql_hash for new entries (user fills via --update-snapshot)
and prunes entries no longer recorded.

The args dict you pass IS what gets serialized — no placeholder
reverse-encoding. One entry per (schema, args) — each schema gets
its own row because resolved column names differ.

Markers for non-JSON-encodable shapes:
- {"_applied_filter": {"all_of"|"any_of": [...]}} for the live SA
  Subquery returned by get_applied_buildings_filter().
- {"_calc_column": {"name", "expr", "table"}} for SA Label produced
  by get_calculated_column().
- {"_mapped_column": {"name", "key_column", "mapping_dict"}} for
  MappedColumn(...).

The new _resolve_calc_and_mapped_columns helper rebuilds the live SA
objects when the snapshot harness loads from_invariants.json.

Other changes:
- tests/example_notebook_builder.py: render the three markers as
  inline construction calls so generated notebooks are runnable
  Python; import MappedColumn at the top of every notebook.
- tests/test_utility.py: drop --update-snapshot/--overwrite-snapshot
  as triggers for notebook regeneration. Trigger is now strictly
  "missing notebook" or "data writethrough fired" — avoids spurious
  regen + working-tree churn on no-op --update-snapshot runs.
- tests/refresh_caches.py: insert a normalize step (with --prune
  when invariants ran fully) between the test runs and cleanup.
- tests/query_snapshots/building_ids.json: three new multi-state
  entries owning the cache hashes that test_applied_buildings_intersection
  previously hit without coverage.
- tests/find_uncovered_cache.py: read-only diagnostic that lists
  cache hashes touched by tests (.cache_usage_log) but not owned
  by any JSON sql_hash. Useful for future audits.
- tests/query_snapshots/from_invariants.json: 181 auto-recorded
  entries from the wrapped invariant call sites.
Result of running tests/refresh_caches.py end-to-end:
- cleanup_stale_caches.py --delete removed cache parquet/sql/json triples
  no longer referenced by any JSON snapshot or recorded by the test suite.
- Example notebooks under tests/example_notebooks/ regenerated to reflect
  the current cache state.
- example_usage/ notebooks re-executed against the refreshed cache.
…g_ids

The TS query path replaced bs_table in the outer FROM with bs_per_bldg
(per the recent pre-aggregation refactor), but `_add_avoid` still resolved
bs columns to self.bs_table — producing a Cartesian comma-join against
the outer FROM and silently dropping the NOT-IN predicate. Same shape
hits ts-side avoid clauses. Fix by splitting `params.avoid` like
`_split_restrict` does for restrict: bs-side avoids fold into
bs_per_bldg's WHERE, ts-side fold into ts_flat's WHERE, and only
join_list-side residue stays at the outer level.

Also add `avoid` to `get_building_ids` (mirrors the existing `restrict`
arg, additive default). Lets callers select the complement of an
applied-buildings filter without going through `bsq.query`.
…metic

Three new tests in tests/test_invariants.py validate the
get_applied_buildings_filter(any_of=, all_of=) API end-to-end:

- test_applied_buildings_union: any_of=[a,b] building set equals union
  of all_of=[a] ∪ all_of=[b]; meaningfulness gate asserts union ≠
  intersection (otherwise the identity is trivially true and the test
  is vacuous).
- test_applied_buildings_avoid_complement: avoid=[applied_filter]
  returns universe \ applied_set; parametrized over filter_kind ∈
  {all_of, any_of} so both filter shapes go through the avoid path.
- test_applied_buildings_set_identities_via_query: aggregated route
  via bsq.query() over a curated 8-bldg universe spanning four regions
  (only_a, only_b, both, neither). Runs 9 filters × 5 group_by = 45
  queries per (schema × annual_only) cell, asserting (a) sum(units_count)
  is invariant to group_by choice under any one filter, and (b) 14 set-
  arithmetic identities including inclusion-exclusion, region
  decompositions, and "restrict + matching avoid == universe" pairings.
  This last test caught the TS-path avoid bug fixed in 22ef15b.

Helpers:
- _pick_meaningful_upgrade_pair: auto-discovers (a, b) such that all
  four region splits are non-empty, cached on bsq instance. Avoids
  brittle hardcoded upgrade ids that may apply identically per schema.
- _curate_applied_universe: picks 8 bldg_ids spanning all four regions
  for bounded TS query cost. Fails loudly if any region is empty.
Eight new tests in tests/test_invariants.py systematically attack code
paths that an adversary would target. All bound to state=CO and either
a curated 8-bldg universe or first-N applied bldg_ids to keep Athena
cost in cents.

- test_applied_filter_subquery_equals_id_list:
  IN-subquery encoding equals materialized IN-list encoding for the
  same effective set, plus idempotency (`restrict=[f, f] == [f]`) and
  cardinality identity (`n_applied + n_avoided == n_universe`).
  Composite-key schemas materialize ALL md-key tuples for the chosen
  bldg_ids so no building is partially sliced (caught a real test-design
  bug where first-N tuples missed county slices on comstock_agg).

- test_applied_filter_multi_state_composition:
  Per-state composite-key tuples are disjoint across states (no
  state leakage), and per-state units_count totals sum to multi-state
  total under the same applied filter.

- test_applied_filter_empty_set_degeneracy:
  When applied set is provably empty (all_of=[1..12]), restrict yields
  0 rows and avoid yields all rows. Pins SQL `IN ()` semantics.

- test_applied_filter_order_and_de_morgan:
  restrict=[A, B] == restrict=[B, A] (order independence). Avoid de
  Morgan: `avoid=[any_of=[a,b]]` == `avoid=[all_of=[a], all_of=[b]]`.

- test_applied_only_equals_explicit_all_of:
  `applied_only=True` for upgrade u equals
  `applied_only=False, restrict=[applied_filter(all_of=[u])]`. Pins
  the equivalence between internal injection (aggregate_query.py:645)
  and the public API. Plus: |savings|≈0 on inapplicable buildings
  (`avoid=[applied_filter]` ⇒ baseline==upgrade).

- test_applied_filter_ternary_union_and_singleton:
  `any_of=[a,b,c]` == ⋃ singletons (catches HAVING-count bugs at N>2).
  `all_of=[a]` == `any_of=[a]` (singleton edge case).

- test_applied_filter_avoid_id_list_equals_avoid_subquery:
  NOT-IN-subquery == NOT-IN-list within curated universe. Catches
  NOT-IN semantic drift.

- test_applied_filter_calc_column_composition:
  calc(elec - gas) under applied filter equals manual elec - gas
  decomposition. Pins calc-column rebinding through the ClauseAdapter
  path under restrict=[applied_filter].

All 24 cases (8 tests × 3 schemas) PASS after fixing two test-design
issues caught during validation: (1) IN-list length exceeded Athena's
262144-char limit for unbounded materialized lists, fixed by curated
universe + per-bldg full-tuple materialization; (2) sorted-first-N
sliced composite-key tuples mid-building, fixed by selecting unique
bldg_ids first then all their tuples.
Three new tests in tests/test_invariants.py target the user's worry
about miscounting buildings across grouping × baseline-join × TS
code paths.

- test_count_partition_under_group_by: per-group sum(units_count)
  and sum(sample_count) over four group_by axes (bldg_id, county,
  bldg_type, county+bldg_type) must equal the ungrouped totals.
  sample_count compared at integer precision (no rtol drift cover).
  Catches tract-fan-out, GROUP BY NULL drops, arbitrary() collapse
  bugs that don't show up in single-axis tests.

- test_count_partition_under_applied_filter: same partition invariant
  composed with an applied filter, bounded to the curated 8-bldg
  universe. Catches grouping bugs that only surface under the
  applied-subquery composition.

- test_count_integrity_annual_vs_ts_year_collapse: integer-exact
  sample_count and float-close units_count agreement between annual
  baseline (`upgrade_id=0`) and TS year-collapse baseline
  (`annual_only=False, timestamp_grouping_func='year'`), across
  multiple group_by axes and with/without applied filter. Pins the
  bs↔ts join wiring through bs_per_bldg pre-aggregation: any drop
  or duplication on the upgrade=0 side flips integer sample_count
  immediately.

All 9 cases (3 tests × 3 schemas) PASS. Snapshot cache refreshed
where these queries land in new shapes; existing entries unchanged.
Three new tests in tests/test_invariants.py compose the applied-buildings
filter with code paths that the existing single-axis tests don't cover.

- test_applied_filter_savings_decomposition_per_row: under
  `include_baseline=include_upgrade=include_savings=True` and an applied
  filter, the row-level identity `b - u ≈ s` must hold for every row
  (not just per-group totals). Catches savings-column wiring bugs that
  resolve against a different bs/up subquery pair than the baseline/
  upgrade columns.

- test_applied_filter_mean_sum_consistency: pins consistency between
  `agg_func='mean'` and the default-sum query under an applied filter.
  Checks that sample_count agrees integer-exact between the two and
  that `sum / (mean * n)` (the implied per-bldg average weight) is
  positive, finite, and within a plausible weight range. Catches the
  mean branch silently swallowing or doubling weight relative to sum.

- test_applied_filter_15min_sums_to_monthly: 15-min raw aggregation
  rolled up to monthly in pandas must equal a direct
  `timestamp_grouping_func='month'` query under the same applied
  filter. Pins the inner ts_flat filter composition through the
  monthly aggregation pivot.

Bounded to the curated 8-bldg universe (state=CO) so the TS scan is
~kilobytes per query. All 9 cases (3 tests × 3 schemas) PASS. Caught
two test-design bugs during validation: (1) `agg_func='mean'`
suffixes the column with `__mean` but my test looked for the
unsuffixed name; (2) the result frame names the timestamp column
after the schema's timestamp_column_name, not the `"time"` group_by
alias.
Targets the user's specific worry: counts may drift when grouping by
county across multiple states because that grain stresses both the
state-partition wiring and the bs_per_bldg → ts_aggr join the most.

Four new tests in tests/test_invariants.py:

- test_multistate_county_partition: under multi-state restrict
  ([CO, WY] for resstock, [CO, NM] for comstock), the
  (state, county)-grouped totals must equal state-grouped totals
  must equal the ungrouped total. Plus: state-grouped multi-state
  rows match per-state-restricted single-queries (units_count
  float-close, sample_count integer-exact). Catches state-boundary
  leakage and cross-state county-name collisions silently merging.

- test_multistate_county_x_bldg_type_decomposition: under multi-state
  restrict, the (state, county, bldg_type)-grouped sums roll up to
  (state, county)-grouped sums when summed over bldg_type. Per
  (state, county) integer sample_count and float-close units_count.
  Catches multi-key group_by losing rows at the tightest aggregation
  grain.

- test_annual_vs_ts_year_at_county_grain_multistate: the smoking-gun
  test for the user's worry. Annual baseline and TS year-collapse
  baseline must agree per (state, county) at integer sample_count
  and float-close units_count. Bounded to a curated cross-state
  16-bldg universe so the TS scan is small. Pins the bs_per_bldg
  pre-aggregation × state-partition × county group_by interaction
  in one query pair.

- test_multistate_savings_shape_at_county_grain: maximum combinatorial
  stress: multi-state restrict + group_by=[state, county] +
  include_baseline + include_upgrade + include_savings + applied_only
  on upgrade leg. Per row `b - u ≈ s` with units_count-scaled
  tolerance, plus positivity sanity (sample_count > 0, units_count > 0).

Helpers:
- _county_col_for_schema: per-schema county column choice
  (in.county_name / county / in.nhgis_tract_gisjoin).
- _county_result_col: result-frame name after _simple_label strips
  the in./out. prefix (caught a real test bug where I indexed
  result frames with the dotted column name).

All 12 cases (4 tests × 3 schemas) PASS. The TS-table queries are
bounded to ~16 curated bldg_ids in the multi-state restrict so each
hits cache in seconds.
The inner ts_flat subquery aliases each per-row TS scalar so the next
layer can split it per-upgrade with FILTER. The old `_v__` prefix was
opaque; `ts__` matches the existing `bs__`/`up__` prefix family and
makes the table-of-origin explicit.

Source change is 5 sites in aggregate_query.py plus 2 comments in
test_schema_unique_keys.py — purely an internal alias rename, no
semantic change. Cached query results are byte-identical, so instead
of re-running 271 affected Athena queries (~3 h, ~$1.10), the rename
was applied directly to the on-disk cache:

  - rename_alias.py rewrites cached SQL, recomputes sha256 hashes,
    renames the .sql/.parquet/.json triples, and updates sql_hash
    entries in the snapshot manifests.
  - 271 cache triples renamed across 3 schema caches.
  - 137 sql_hash entries updated across 6 manifests.
  - Fixed-point check: re-running the script reports 0 pending work.

query_stats.py is also added — a general-purpose summary script that
walks *_cache/*.json and reports per-schema query count, bytes
scanned, $ cost @ $5/TB, and wall-clock time.
ComStock queries averaged ~46 s vs ResStock's ~3.7 s in the snapshot
suite, but the asymmetry concentrates on metadata-only queries
(~42 s ComStock vs ~2.9 s ResStock). Two root causes were identified
and quantified through a read-only probe script:

1. Small-files / over-partitioning. The ComStock
   `_md_*_by_state_and_county_parquet` tables hold 194,308 small parquet
   files across 3,133 (state, county) partitions; ResStock's `_metadata`
   has 17 files unpartitioned. Even with a state filter that prunes to
   one state, ComStock pays ~12 s planning + ~50 s of per-file footer
   reads.

2. Missing predicate propagation in `buildstock-query` SQL generation.
   When the outer query has `WHERE state='CO'` but the inner
   `IN (SELECT ... HAVING ...)` subquery does not, Athena scans all
   3,133 partitions on the inner side. Pushing the same state filter
   into the inner subquery cuts a 58 s query to 5 s — an 11.6×
   speedup from a single duplicated predicate.

Two pre-aggregated alternative tables already published by the
data pipeline (`_md_agg_by_state_parquet`, `_md_agg_national_parquet`)
go unused by the framework today; routing eligible queries to them
would add another ~3× on top of fix #2.

Adds:
- tests/query_snapshots/investigate_partitions.py — read-only probe
  script with cost guardrails; results cached so re-runs are free.
- tests/query_snapshots/_partition_probe_cache/ — content-addressed
  probe cache (matches existing schema-cache pattern).
- tests/query_snapshots/partition_investigation.json — consolidated
  dump of all probe results.
- tests/query_snapshots/INVESTIGATION_partition_overhead.md — full
  writeup of methodology, findings, and recommended priority order.
When the framework emits `IN (SELECT ... HAVING ...)` against the metadata
table (e.g. for applied-buildings filters from
`get_applied_buildings_filter()` or applied_only=True with savings),
Athena does not propagate the outer query's `WHERE state='CO'` into the
inner subquery. On ComStock metadata — partitioned `(state, county)` ×
~62 upgrades = 194,308 small parquet files across 3,133 partitions —
that left the inner side enumerating every partition, dominating
wall-clock with ~12 s of planning + ~50 s of footer reads.

Fix: in `_get_restrict_clauses`, pre-scan restrict for safe single-
column predicates against bs_table columns (using the existing `_get_column`
resolution so user-supplied `state` maps to ResStock's `in.state` and
ComStock's bare `state`). For any subquery-valued restrict entry, inject
those predicates into the subquery's WHERE — but only when the column
is both projected by the subquery and part of the outer IN-clause column
tuple. This is semantics-preserving by construction: the outer IN-tuple
match logically implies the same predicate on the inner side.

Verified end-to-end on shape C against all four ComStock metadata
table variants (`investigate_partitions.py` re-run with new shape C2):

  Table layout                        Pre-fix (C)  Post-fix (C2)  Speedup
  _md_by_state_and_county_parquet      59.84 s     4.14 s         14.4×
  _md_agg_by_state_and_county_parquet  58.07 s     4.38 s         13.3×
  _md_agg_by_state_parquet              3.16 s     1.76 s          1.8×
  _md_agg_national_parquet              2.47 s     2.00 s          1.2×

44 ComStock snapshot variants drifted (one extra `AND bs.<col>=<val>`
in the inner WHERE). Zero ResStock drift. Snapshots regenerated via
--update-snapshot — every entry showed the expected speedup, with
typical -90% bytes scanned and -90% wall-clock per query. Total regen
spend: < $0.01.

Updates `INVESTIGATION_partition_overhead.md` to mark Fix #1 as shipped
with the measured speedups.
User asked whether deduplicating both TS and MD at query time
(arbitrary() on TS, sum(weight) on MD, single-key bldg_id join) would
beat the routing-to-state-agg-table approach planned for Fix #2 of
the partition overhead investigation. Built a 4-variant benchmark
against ComStock TS-monthly baseline, restrict to CO+NM, no group_by:

  V1  primary + bs_per_bldg (today's shape)             14.19 s
  V2  alt + no bs_per_bldg (routing only)                7.28 s   <- baseline
  V3  primary + dedup (dedup only)                      22.24 s   (worse than V1)
  V4  alt + dedup (routing + dedup)                      5.78 s   (-21% vs V2)

All four return total_kwh = 21,434,645,727 — identical to 0 ppm.
arbitrary() on cross-state duplicates is exact because value columns
are bit-identical across the duplicates per the dedup invariant.

Headline takeaways:
- Routing alone (V2) gets ~2× over V1, with ALL the savings in
  planning time (7.2 s -> 318 ms). The small-files-pathology that
  motivates this entire investigation disappears completely on the
  62-file alt table.
- Dedup on the slow primary table (V3) is worse than doing nothing.
  TS-side arbitrary() shuffle costs more than it saves when the rest
  of the query already pays the planning tax.
- Dedup on the alt table (V4) wins another 21% over V2, purely on
  engine time (5.54 s vs 7.04 s). Real but modest.

Decision: ship routing only (Pieces A+B+C in the plan). Dedup adds
non-trivial query-shape complexity (CTE-style WITH clauses, arbitrary
wrappers on every value column, single-key joins, separate eligibility
check) for a 21% engine-time win on top of routing's 2× planning
win. Defer dedup until concrete production workloads sit at the V2
baseline regularly.

Adds benchmark_dedup.py + dedup_benchmark.json + cached probe
receipts as a reproducible record. Updates INVESTIGATION_partition_overhead.md
with the benchmark findings so future readers can revisit the
decision.
Routing (Pieces A+B from the partition-overhead investigation):
- Schema model gains TableSuffix.annual_and_metadata_state_agg and
  UniqueKeys.metadata_state_agg (both optional). Comstock unified TOML
  declares the alt _md_agg_national_parquet table.
- query_core._initialize_tables loads the alt table; _routing_context
  context manager swaps bs_table/md_table/md_key/sample_wt for one
  _query call. _get_unique_keys auto-routes to metadata_state_agg keys
  via table identity.
- main._pick_metadata_table selects "primary" or "state_agg" from
  group_by/restrict eligibility. Annual queries route; TS-flow stays
  on primary (alt table's in.state vs primary's state column-name
  asymmetry blocks cross-name joins).
- _get_restrict_clauses/_inject_propagated accept bs_table=, restrict
  candidate_tables to (bs_table,) when routed so 'state' doesn't bind
  to the TS partition column.

model_count column (Piece C):
- New aggregate column count(distinct bldg_id) added alongside the
  metadata-row count at every emission site (8 in aggregate_query.py,
  1 in main.py:get_distinct_count). Optimization: emits count(*) when
  post-join is bldg-unique-per-outer-group (~22% cheaper than count
  distinct on Athena, which doesn't auto-rewrite).

Rename sample_count -> metadata_rows_count (Piece C cleanup):
- Old name was misleading: it counted metadata rows (tract slices),
  not distinct simulation models. Renamed at 9 emission sites plus
  ~120 references in tests/test_invariants.py and
  tests/test_schema_unique_keys.py.

Snapshot regen included: ~1800 cache files churn (sql hash drift +
new model_count column). All cache contents validated against
existing data via the snapshot harness; pure SQL-shape drift, no
real data divergence.

Checkpoint commit before unified MD-LEFT-TS rewrite.
Replaces opt-in --include-local flag with env-based gating: tests run
locally by default and skip when CI=true (set automatically by GitHub
Actions and most CI providers). Removes the friction of remembering a
flag for the common laptop case.
The bs_table/md_table/md_key locals in the TS aggregation entry point
were leftover from before routing moved into `_routing_context` on the
BSQ — nothing reads them, the helpers inherit routing automatically.
AnyTableType was imported but never referenced.
@rajeee rajeee mentioned this pull request May 4, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants