Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
132 commits
Select commit Hold shift + click to select a range
bb8a12d
Migrate packaging from setup.py to pyproject.toml; bump to 0.4.0
rajeee Apr 23, 2026
8c0d6d3
Migrate to uv-native packaging with hatchling, ruff, and pre-commit
rajeee Apr 23, 2026
b03b7ec
Switch CI to uv; drop py3.10/3.11 from matrix
rajeee Apr 23, 2026
4a7e448
Add multi-column unique_keys schema and wire into aggregate counts
rajeee Apr 23, 2026
85794cf
Filter applied_in on full metadata unique-key tuple
rajeee Apr 23, 2026
edf0a57
Return full unique-key tuples from building-id helpers
rajeee Apr 23, 2026
813d779
Stop setting bldg_id as the pandas index on results/upgrades CSVs
rajeee Apr 23, 2026
ee3e2c6
Rewrite check_ts_bs_integrity for multi-column unique keys
rajeee Apr 23, 2026
8231be3
Remove unused _get_simulation_timesteps_count
rajeee Apr 23, 2026
f85e9b5
Add tests for multi-column unique-key paths
rajeee Apr 23, 2026
2b91ddb
Restore flake8 lint checks, reverting ruff strictness
rajeee Apr 23, 2026
d03b9cb
Sort rows in Python before comparing UNLOAD results in test
rajeee Apr 23, 2026
6ebc38a
Restore bldg_id pandas index on get_results_csv and get_upgrades_csv
rajeee Apr 23, 2026
d793cdf
Support multi-column restrict keys for composite-key schemas
rajeee Apr 24, 2026
0483bd1
Use ts_key for sample_count on timestamp_grouping_func ts path
rajeee Apr 24, 2026
45cea05
Remove deprecated savings_shape, aggregate_annual, aggregate_timeseries
rajeee Apr 24, 2026
33619ec
Allow timestamp_grouping_func='year' on TSQuery
rajeee Apr 24, 2026
c74a749
Fix viz_data monthly-results query to use run_obj.query()
rajeee Apr 24, 2026
01db542
Skip ts_b subquery when only upgrade values are needed
rajeee Apr 24, 2026
5818732
Move superseded query tests under tests/legacy/
rajeee Apr 24, 2026
1f57794
Add snapshot-driven query tests with invariants
rajeee Apr 24, 2026
531a7a9
Pass annual_only to _add_restrict to match _add_avoid
rajeee Apr 24, 2026
25fa3fa
Count distinct metadata keys for sample_count under ts-grouping
rajeee Apr 24, 2026
6cdaa46
Re-enable comstock sample_count/units_count invariant checks
rajeee Apr 24, 2026
973c002
Auto-prepend characteristics prefix in _get_column fallback
rajeee Apr 25, 2026
f8091f7
Normalize snapshot column names to rely on _get_column fallback
rajeee Apr 25, 2026
ec7a87e
Strip characteristics prefix from all snapshot column references
rajeee Apr 25, 2026
8f1a3e0
Merge resstock/comstock invariants into parametrized tests
rajeee Apr 25, 2026
4efd167
Merge schema snapshots, type-safe SQL literals, expanded invariants
rajeee Apr 25, 2026
41f8b68
Remove deprecated pandas downcasting option
rajeee Apr 25, 2026
da94656
Replace placeholder-dict indirection with single-value resolver
rajeee Apr 25, 2026
c0df11c
Expand snapshot coverage for query() shape gaps
rajeee Apr 25, 2026
038a296
Replace pickle cache with content-addressed disk store
rajeee Apr 25, 2026
a252948
Expand snapshot suite with helpers, utility, and aggregation coverage
rajeee Apr 26, 2026
d96c658
Track snapshot-cache parquets so the data-check survives cold clones
rajeee Apr 26, 2026
0bcdea5
Compute year-collapse counts inline instead of pre-fetching rows_per_…
rajeee Apr 26, 2026
1942eb9
Add four invariants: quartile ordering, nonzero bounds, sort+limit, a…
rajeee Apr 26, 2026
db0a526
Fix four library bugs uncovered by snapshot expansion + add coverage
rajeee Apr 26, 2026
53b00ef
Add TS time-bucket and sample_count cross-check invariants, +
rajeee Apr 26, 2026
c273bac
Add multi-attribute TS group_by, two-fuel TS savings, and savings-bou…
rajeee Apr 26, 2026
a0ff575
Add 15-min raw → hourly sum invariant
rajeee Apr 26, 2026
6fb14e2
Add get_results_csv snapshot for tiny building_id list
rajeee Apr 26, 2026
39813a7
Fix get_upgrades_csv cross-join + add helper-method snapshots
rajeee Apr 26, 2026
5065e63
Add user-weights snapshot + sample_count and enduse-sign invariants
rajeee Apr 26, 2026
4e9d374
Fix applied_only filter on TS query paths
rajeee Apr 26, 2026
e771afb
Add rows_per_sample to ts_year branch and expand three-way invariant
rajeee Apr 26, 2026
e85e741
Add cross-query savings invariants + allow applied_in on baseline
rajeee Apr 26, 2026
649ae50
Add multi-state savings additivity invariant
rajeee Apr 26, 2026
f3984d2
Switch comstock multi-state pair to CO+NM with confirmed bldg_id coll…
rajeee Apr 26, 2026
66cac06
Add targeted comstock composite-key handling test for shared bldg_id
rajeee Apr 26, 2026
c7d0653
Add mutation test proving composite keys are load-bearing in joins
rajeee Apr 26, 2026
27316f4
Extend three-way and applied_in invariants to multi-state restricts
rajeee Apr 27, 2026
040358d
Add snapshot coverage for utility eiaid lookups + baseline applied_in
rajeee Apr 27, 2026
4e51131
Fix get_upgrade_names malformed SQL and get_successful_simulation_cou…
rajeee Apr 27, 2026
4e003ab
Wire upgrade_name column through schema config
rajeee Apr 27, 2026
e9f0104
Refresh comstock upgrade_names_oedi snapshot for in.upgrade_name column
rajeee Apr 27, 2026
c699b70
Add get_query_only to utility.get_eiaids and snapshot it
rajeee Apr 27, 2026
30cd4dc
Add local-only test infrastructure for full-data report methods
rajeee Apr 27, 2026
15b1b4e
Salvage schema/unique_keys unit tests, delete redundant legacy tests
rajeee Apr 27, 2026
17c9495
Add tests/README.md with workflow rules and durable project knowledge
rajeee Apr 27, 2026
f6cfebd
Pivot schema model to 2-table shape (annual_and_metadata + timeseries)
rajeee Apr 27, 2026
31028ef
Drop 3-table support; thread upgrade=0 filter through baseline-side j…
rajeee Apr 27, 2026
d867271
Accept SQLAlchemy Alias as a table type; dedupe column-search candidates
rajeee Apr 27, 2026
fe8755b
Rename SA aliases to bs/up to reflect that the underlying table is un…
rajeee Apr 27, 2026
e178e38
Drop dead up_table-None guards; fix bs_table.name aliases bug
rajeee Apr 27, 2026
fd4a912
Filter baseline rows out of upgrade-side report queries
rajeee Apr 27, 2026
e5ae0b3
Add upgrade=0 filter to get_building_ids; refresh stale alias asserts
rajeee Apr 27, 2026
f928b1d
Fix more upgrade=0-filter bugs surfaced by snapshot data check
rajeee Apr 27, 2026
bf3bbcc
Refresh snapshot caches for the 2-table pivot SQL
rajeee Apr 27, 2026
4aea0ab
Step 1: Collapse bs_table/up_table → md_table; rename bs_key → md_key
rajeee Apr 27, 2026
0485dd3
TS upgrade-pair pivot: replace ts self-join with single-scan + GROUP BY
rajeee Apr 27, 2026
a584a57
TS pivot: COALESCE fallback (not bldg_id IS NULL); CTE form; dedupe b…
rajeee Apr 27, 2026
998c8d5
Revert TS pivot CTE → subquery (SA 2.0 doesn't expand literal binds i…
rajeee Apr 27, 2026
8a8d43d
Add Athena cost-history lookup; normalize SQL at compile time
rajeee Apr 27, 2026
7c89c39
Backfill cost metrics into snapshot JSON entries
rajeee Apr 27, 2026
326792d
Revert "Backfill cost metrics into snapshot JSON entries"
rajeee Apr 27, 2026
741f3b8
Move execution metadata into the cache layer; cost-regression gate
rajeee Apr 27, 2026
774dc49
Fix comma-join + DUPLICATE_COLUMN_NAME in select-from-md_table call s…
rajeee Apr 27, 2026
6835af8
Fix get_distinct_count comma-join when caller passes md_table by name
rajeee Apr 27, 2026
d1e660e
Add explicit select_from() at 6 audit sites that relied on implicit FROM
rajeee Apr 27, 2026
e2d0916
Fix aggregate_ts_by_eiaid: bind eiaid join column to bs_table not md_…
rajeee Apr 27, 2026
8ca2af1
Refresh snapshot cache after pivot + comma-join fixes; populate metad…
rajeee Apr 27, 2026
2463695
Restore 8 skipped tests post-pivot
rajeee Apr 27, 2026
8380cd1
Track snapshot SQL/JSON cache artifacts; backfill missing cost sidecars
rajeee Apr 27, 2026
6c74873
Clarify schema-filtered snapshot skip messages
rajeee Apr 27, 2026
9cd148d
Fix comma-join in check_ts_bs_integrity boot path
rajeee Apr 27, 2026
6062554
Add snapshot coverage for check_ts_bs_integrity boot path
rajeee Apr 27, 2026
0870fba
Fix TS pivot crash on calculated columns; add coverage
rajeee Apr 27, 2026
c1e0f27
Rewrite TS upgrade-pair pivot for Athena perf
rajeee Apr 27, 2026
9b1f381
Refresh snapshot cache for new TS pivot shape
rajeee Apr 27, 2026
44c88ec
Document semantic intent of quartiles under inner pre-bucketing
rajeee Apr 27, 2026
d5ecbca
Disallow get_quartiles on timeseries queries
rajeee Apr 27, 2026
66c3d95
Unify TS aggregation: pre-bucket inner GROUP BY, defer pure-bs enduses
rajeee Apr 27, 2026
a27f3f0
Pre-aggregate bs to building grain to eliminate tract fan-out
rajeee Apr 27, 2026
f65cd24
Fix _restrict_targets_md to recognize bare unprefixed column names
rajeee Apr 27, 2026
1168747
Plumb join_list bs columns through bs_per_bldg
rajeee Apr 27, 2026
dfb2983
Fall back to direct ts ⋈ bs JOIN when join_list is present
rajeee Apr 27, 2026
1d0b219
Fold join_list joins into bs_per_bldg subquery
rajeee Apr 27, 2026
3b30bb1
Require absolute MB/ms delta on cost-regression gate
rajeee Apr 27, 2026
5095a06
Skip inner timestamp materialization for year-collapse TS queries
rajeee Apr 27, 2026
c5a8a92
Refresh calc-col TS pivot snapshot after variance-driven gate trip
rajeee Apr 28, 2026
622b81e
Auto-generate example notebooks from snapshot test entries
rajeee Apr 28, 2026
5afe82f
Wire example notebooks to reuse the snapshot test cache
rajeee Apr 28, 2026
86a1cab
Resolve example notebook cache via _dh[0] relative path
rajeee Apr 28, 2026
9012955
Execute example notebooks via nbclient with safety gating
rajeee Apr 28, 2026
b15b14e
Fix flake8 lint failures from new_sampling branch
rajeee Apr 28, 2026
a94b840
Merge branch 'main' into new_sampling
rajeee Apr 28, 2026
182ff21
Group bs_per_bldg by user group_by columns instead of arbitrary()
rajeee Apr 28, 2026
95af30d
Track cache usage and add stale-cache cleanup tooling
rajeee Apr 28, 2026
990259a
Refresh example notebooks after cache-cleanup re-execution
rajeee Apr 28, 2026
0c85504
Add comstock_oedi_agg as a third supported schema
rajeee Apr 29, 2026
5aeacd6
Add cross-schema invariant: comstock_oedi ≡ comstock_oedi_agg aggregates
rajeee Apr 29, 2026
1a00198
Pin applied_in × TS × group_by interaction across both ComStock schemas
rajeee Apr 29, 2026
a44bed7
Replace Query.applied_in with composable get_applied_buildings helpers
rajeee Apr 29, 2026
6115254
Surface notebook execution failures and fix get_locations_by_eiaids
rajeee Apr 29, 2026
35c6738
Auto-record invariant query shapes via record_query() to from_invaria…
rajeee Apr 29, 2026
4e3d3be
Refresh caches: prune stale entries and re-execute example notebooks
rajeee Apr 29, 2026
122196c
Fixes
rajeee Apr 29, 2026
22ef15b
Fix TS-path avoid for bs/ts-side columns; expose avoid on get_buildin…
rajeee Apr 29, 2026
0e658c1
Add invariant tests for applied-buildings union, avoid, and set arith…
rajeee Apr 29, 2026
fec30f6
Add red-team invariant tests for applied-buildings filter API
rajeee Apr 30, 2026
f04e64e
Add Round 2 counting-integrity invariants for group_by × TS × baseline
rajeee Apr 30, 2026
9372f29
Add Round 3 invariants — applied filter × {savings, mean, monthly TS}
rajeee Apr 30, 2026
ed88b47
Add Round 4 invariants — multi-state × county-grain × TS-baseline-join
rajeee Apr 30, 2026
ce83f05
Rename `_v__<col>` → `ts__<col>` for ts_flat scalar projections
rajeee Apr 30, 2026
78fc46b
Diagnose ComStock metadata partition-overhead slowness
rajeee Apr 30, 2026
5ef9814
Propagate restrict predicates into IN-subqueries; 13× speedup on shape C
rajeee Apr 30, 2026
235ecc1
Benchmark two-sided dedup: routing alone wins; dedup not worth shipping
rajeee Apr 30, 2026
04f4102
Auto-route metadata table, add model_count, rename sample_count
rajeee May 1, 2026
96335e9
Refresh snapshots and fix invariant replay
rajeee May 4, 2026
9c6d4b7
Auto-skip local_only tests in CI via CI env var
rajeee May 4, 2026
c848017
Remove dead locals and unused import in aggregate_query
rajeee May 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
17 changes: 10 additions & 7 deletions .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,16 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ['3.10', '3.11', '3.12', '3.13', '3.14']
python-version: ['3.12', '3.13', '3.14']
name: Tests - Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
python-version: ${{ matrix.python-version}}
enable-cache: true
- name: Set up Python ${{ matrix.python-version }}
run: uv python install ${{ matrix.python-version }}
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4.1.0
with:
Expand All @@ -29,8 +32,8 @@ jobs:
- name: Who am I?
run: aws sts get-caller-identity
- name: Install buildstock_query
run: pip install -e .[dev]
run: uv sync --group dev --python ${{ matrix.python-version }}
- name: Pytest
run: python -m pytest -vv
run: uv run --python ${{ matrix.python-version }} pytest -vv
- name: Lint
run: flake8 buildstock_query
run: uv run --python ${{ matrix.python-version }} flake8 buildstock_query
16 changes: 16 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,23 @@ poetry.lock
build/
*.egg-info
*.parquet
# Track snapshot-cache parquets so the data-check survives a fresh clone.
!tests/query_snapshots/**/*.parquet
!tests/query_snapshots/**/*.sql
!tests/query_snapshots/**/*.json
*.txt
*.csv
*.yml
.bsq_cache
# Transient query-execution audit log written by query_core; not snapshot data.
.execution_history
# Per-session SqlCache hash log — populated by SqlCache.get/put, consumed by
# tests/cleanup_stale_caches.py to identify orphaned cache entries.
.cache_usage_log
# Per-session record_query JSONL log — populated by tests/snapshot_recorder.py,
# consumed by tests/normalize_invariant_snapshot.py to update from_invariants.json.
.from_invariants_log.jsonl
# Local-only test cache: full metadata parquets downloaded from S3 (~hundreds
# of MB) for pure-pandas methods like get_applied_options. Tests that use this
# cache require --include-local; default behavior is to skip them in CI.
tests/local_only/cache/
17 changes: 17 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-toml
- id: check-added-large-files
args: ["--maxkb=500"]

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.9
hooks:
- id: ruff
args: ["--fix"]
- id: ruff-format
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12
28 changes: 12 additions & 16 deletions buildstock_query/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,27 @@
- - - - - - - - -
A library to run AWS Athena queries to get various data from a BuildStock run. The main class is called BuildStockQuery.
An object of BuildStockQuery needs to be created to perform various queries. In addition to supporting various
query member functions, the BuildStockQuery object contains 4 member objects that can be used to perform certain
class of queries and analysis. These 4 member objects can be accessed as follows::
query member functions, the BuildStockQuery object contains 3 member objects that can be used to perform certain
class of queries and analysis. These member objects can be accessed as follows::

bsq = BuildStockQuery(...) `BuildStockQuery` object
bsq.agg `buildstock_query.aggregate_query.BuildStockAggregate`
bsq.report `buildstock_query.report_query.BuildStockReport`
bsq.savings `buildstock_query.savings_query.BuildStockSavings`
bsq.utility `buildstock_query.utility_query.BuildStockUtility`
bsq = BuildStockQuery(...) `BuildStockQuery` object
bsq.agg `buildstock_query.aggregate_query.BuildStockAggregate`
bsq.report `buildstock_query.report_query.BuildStockReport`
bsq.utility `buildstock_query.utility_query.BuildStockUtility`

```
# Some basic query can be done directly using the BuildStockQuery object. For example:
from buildstock_query import BuildStockQuery
# The core query API lives on the BuildStockQuery object itself:
from buildstock_query import BuildStockQuery
bsq = BuildStockQuery(...)
bsq.get_results_csv()
bsq.get_upgrades_csv()
bsq.query(enduses=[...], annual_only=True, ...) # annual baseline / upgrade results
bsq.query(enduses=[...], annual_only=False, ...) # timeseries aggregations
bsq.query(enduses=[...], upgrade_id="1", include_savings=True) # savings shape

# Other more specific queries can be done using specific query class objects. For example:
bsq.agg.aggregate_annual(...)
bsq.agg.aggregate_timeseries(...)
...
# Reports and utility-specific helpers:
bsq.report.get_success_report(...)
bsq.report.get_successful_simulation_count(...)
...
bsq.savings.savings_shape(...)
...
bsq.utility.aggregate_annual_by_eiaid(...)
```

Expand Down
Loading
Loading