Income model, update documentation, examples, and add ARM build to pipeline by yamilbknsu · Pull Request #86 · NatLabRockies/DEMOS

yamilbknsu · 2026-04-24T17:18:31Z

This pull request introduces several important updates to the DEMOS demographic microsimulator, focusing on improved documentation, expanded Docker support, configuration and model enhancements, and codebase improvements. The changes make the project more accessible to new users, add support for ARM64 Docker builds, update model configurations and coefficients for greater clarity and accuracy, and introduce new model modules and utility functions.

Documentation and Usability Improvements:

The README.md has been significantly expanded to provide clearer setup instructions, including prerequisites, example data usage, and troubleshooting for Docker memory limits. Documentation links have been updated and a new contact section has been added. [1] [2] [3] [4]

Docker and Workflow Enhancements:

The Docker workflow now supports multi-platform builds (both linux/amd64 and linux/arm64), allowing the project to be run on a wider range of systems, including Apple Silicon Macs.

Configuration and Model Updates:

The TOML configuration files (demos_config.toml and demos_config_ref.toml) have been reordered and expanded to include new modules such as income, income_adjustment, and normalize_table_dtypes. Redundant or unused table definitions have been removed for clarity. [1] [2] [3] [4]
New and revised model coefficient files have been added or updated for modules such as birth, cohabitation, labor force participation, divorce, education, kids moving, marriage, mortality, and income. These updates standardize variable names and model expressions for consistency and clarity. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

Core Codebase Improvements:

The model initialization and module imports have been updated to include new modules such as normalize_table_dtypes and income, and to ensure all constants are available. [1] [2]
A new utility function hh_head_age has been added to demos/models/aging.py to efficiently retrieve the age of the household head, supporting new model requirements.
The birth model has been updated to use standardized household race column names, improving code clarity and reducing potential for errors.

Update documentation pre-release

added contact information.

Update small example data

Added `arm` build to docker pipeline

…y and occupation

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR expands DEMOS with a new income model + regression template, reorganizes/standardizes computed variables, and significantly improves documentation and Docker support (including ARM64 builds).

Changes:

Add an OLS regression template and an income model step with supporting household/person variables.
Refactor and relocate many Orca computed columns into model modules; refresh example calibrated coefficient YAMLs accordingly.
Expand Sphinx docs (calibration, variables reference, data sources) and update Docker publishing to build for linux/amd64 and linux/arm64.

Reviewed changes

Copilot reviewed 41 out of 43 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
scripts/clean_columns.py	Adds a utility script to subset H5 columns and synthesize job fields.
pyproject.toml	Bumps version, updates metadata/URLs, removes project script entry.
docs/source/pages/variables.md	Adds a comprehensive variables reference page.
docs/source/pages/intro.md	Expands Docker Compose onboarding instructions.
docs/source/pages/datasources.rst	Adds detailed docs on CSV vs HDF5 table sources and config examples.
docs/source/pages/configuration.rst	Links new calibration + variables pages into configuration docs.
docs/source/pages/calibration.md	Adds detailed calibration documentation and examples.
docs/source/pages/advanced_configuration.md	Adds explanation of lazy Orca columns, caching, and GEOID assignment.
docs/source/conf.py	Updates org metadata and enables MyST `dollarmath`.
docker-compose.yml	Updates default GHCR namespace for the image.
demos/variables.py	Large refactor/cleanup of computed variables; introduces standardized naming.
demos/templates/estimated_models/regression_model.py	Adds RegressionStep (OLS) estimated model template.
demos/templates/estimated_models/init.py	Exposes RegressionStep from estimated_models package.
demos/templates/calibration/procedures.py	Broadens calibration typing to TemplateStep and improves logging messages.
demos/models/marriage.py	Changes matching sort key from `earning` to computed per-person household income.
demos/models/kids_moving.py	Moves kids-move computed columns into kids_moving module.
demos/models/income.py	Introduces income step and associated household columns for income regression.
demos/models/household_reorg.py	Renames head-race columns and adds household aggregate computed columns.
demos/models/fatality.py	Moves mortality age-bin computed columns into fatality module.
demos/models/employment.py	Stops updating earnings in employment step; moves employment age bins into module.
demos/models/data_fix.py	Adds normalize_table_dtypes step to reduce PyTables object-dtype warnings.
demos/models/constants.py	Adds relational adjustment mapping table and state quartile labels.
demos/models/birth.py	Updates head-race column names and adds birth-related household computed columns.
demos/models/aging.py	Adds `hh_head_age` computed column (households).
demos/models/init.py	Imports income + constants modules for initialization side effects/registration.
demos/config.py	Adds `normalize_table_dtypes` to default module ordering.
data/small_example/calibrated_models_coefficients/mortality_model.yaml	Updates model expression to standardized variable names.
data/small_example/calibrated_models_coefficients/marriage.yaml	Updates spec names to standardized variable names.
data/small_example/calibrated_models_coefficients/kids_move_model.yaml	Updates model expression to standardized variable names + new cross-products.
data/small_example/calibrated_models_coefficients/income_model_w_nworkers.yaml	Adds new income model coefficients using RegressionStep.
data/small_example/calibrated_models_coefficients/income_model.yaml	Adds baseline income model coefficients using RegressionStep.
data/small_example/calibrated_models_coefficients/edu_model.yaml	Updates model expression to standardized variable names.
data/small_example/calibrated_models_coefficients/divorce_model.yaml	Updates model expression to standardized variable names.
data/small_example/calibrated_models_coefficients/demos_out_labor_force.yaml	Updates model expression to standardized variable names.
data/small_example/calibrated_models_coefficients/demos_in_labor_force.yaml	Updates model expression to standardized variable names.
data/small_example/calibrated_models_coefficients/cohabitation.yaml	Updates spec names to standardized variable names.
data/small_example/calibrated_models_coefficients/birth_model.yaml	Updates model expression to standardized variable names.
configuration/demos_config_ref.toml	Reorders modules, adds income + dtype normalization, removes CSV mapping table entry.
configuration/demos_config.toml	Reorders modules, adds income + dtype normalization, removes CSV mapping table entry.
README.md	Updates docs link, expands onboarding instructions, adds contact info.
.github/workflows/docker.yml	Enables multi-platform Docker builds (amd64 + arm64).

Comments suppressed due to low confidence (10)

demos/variables.py:1

The edu_hs_ged Orca column is defined twice in the same module. The latter definition will silently override the former, which is error-prone and can hide future changes. Remove the duplicate block (or consolidate into a single definition) to ensure there is exactly one authoritative implementation.
demos/variables.py:1
The edu_hs_ged Orca column is defined twice in the same module. The latter definition will silently override the former, which is error-prone and can hide future changes. Remove the duplicate block (or consolidate into a single definition) to ensure there is exactly one authoritative implementation.
demos/variables.py:1
age_60plus returns a DataFrame because p is a DataFrame; Orca columns are expected to return a 1D Series aligned to the table index. Use the age Series (e.g., persons.to_frame(...)[\"age\"]) so this returns a Series.
demos/templates/estimated_models/regression_model.py:1
The constructor uses a mutable default argument (tags=[]). This can cause tags to leak across instances if the list is mutated. Use tags=None and normalize to an empty list inside __init__.
docker-compose.yml:1
The service hard-pins the platform to linux/amd64. With ARM64 images now being built, this forces amd64 emulation on ARM hosts (e.g., Apple Silicon), which can significantly degrade performance and sometimes breaks native execution. Consider removing platform: (let Docker pick the native arch) or making it configurable via an env var.
scripts/clean_columns.py:1
This script introduces non-deterministic outputs by using np.random.choice without a seed, which makes it hard to reproduce results across runs. Consider adding a --seed argument (or documenting that this is purely for synthetic/testing use) and seeding a local RNG. Also fix the typo in the comment (hardn0coded → hardcoded) to avoid confusion.
scripts/clean_columns.py:1
This script introduces non-deterministic outputs by using np.random.choice without a seed, which makes it hard to reproduce results across runs. Consider adding a --seed argument (or documenting that this is purely for synthetic/testing use) and seeding a local RNG. Also fix the typo in the comment (hardn0coded → hardcoded) to avoid confusion.
pyproject.toml:1
Correct organization name spelling: 'Berkley' should be 'Berkeley' in the author metadata.
pyproject.toml:1
This change removes the [project.scripts] entry that previously exposed a demos CLI command. If users rely on pip install . providing a demos executable, this is a breaking change. Either restore the script entry (if still supported) or update documentation to describe the new invocation method.
docs/source/pages/calibration.md:1
The term 'marrital' is misspelled in the docs (should be 'marital'). If the code truly hard-codes the misspelled table name, consider supporting the correctly-spelled alias as well (or correcting the code + docs together) to avoid locking in a typo as part of the external interface.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T17:28:42Z

+# Max education year of head or spouse (relate < 2)
+@orca.column("households")
+def hh_edu_top(persons):
+    df = persons.to_frame(columns=["household_id", "edu", "relate"])
+    df = df[df["relate"] < 2][["household_id", "edu"]]
+    return df.groupby("household_id").agg({"edu": "max"})


Several @orca.column(\"households\") functions appear to return DataFrames (e.g., groupby(...).agg({...})) or return a column name (\"age\") that does not match the semantic meaning (hh_n_children). Orca columns should return a Series for the target table’s index. For hh_n_children, compute a per-household count and return it as a Series named hh_n_children. For hh_edu_top/hh_age_avg (and similar), return the aggregated Series (e.g., .max() / .mean()) rather than a single-column DataFrame.

Co-authored-by: Copilot <[email protected]>

yamilbknsu and others added 28 commits January 29, 2026 09:47

Merge pull request #83 from NatLabRockies/dev

647337b

Update documentation pre-release

updated the technical memo

de4437a

Update README.md

f798db3

added contact information.

Merge pull request #84 from NatLabRockies/dev

b5af542

Update small example data

update version tag

116c7ee

update "nrel" reference

7c3d2a5

update nrel reference

0cae30a

update home url

c8c905e

updates to documentation

8403ce0

typo

9b187a8

added arm build to docker pipeline

1c67796

Merge pull request #85 from NatLabRockies/ci

be6debe

Added `arm` build to docker pipeline

implementation of regression model

63a30b5

Calibration documentation

007df55

more docs

45f864e

readme change

6fdd180

fix readme link

510994a

working implementation of income model

e4d9a5d

column fix

31ea8be

refactor variables

a6e98cd

fix inconsistent dtypes

e4e6d44

data type fixes

30a05d0

implement per-person income in marriage selection

8cb03af

remove NREL reference

ef89222

refactored relmap file

aac079d

clean small_example data and add randomly generated values to industr…

0ebb815

…y and occupation

black

36bc58f

black

e1bd2cb

yamilbknsu requested a review from Copilot April 24, 2026 17:20

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Copilot started reviewing on behalf of yamilbknsu April 24, 2026 17:41 View session

yamilbknsu and others added 3 commits May 4, 2026 15:30

Update README.md

2828427

Co-authored-by: Copilot <[email protected]>

Update demos/models/constants.py

2edd051

Co-authored-by: Copilot <[email protected]>

black

fea3d46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Income model, update documentation, examples, and add ARM build to pipeline#86

Income model, update documentation, examples, and add ARM build to pipeline#86
yamilbknsu wants to merge 31 commits intodevfrom
yep/income_model

yamilbknsu commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yamilbknsu commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants