Skip to content

Conversation

@NathanielF
Copy link
Contributor

@NathanielF NathanielF commented Nov 19, 2025

Just a draft for the minute. Working through some ideas.


📚 Documentation preview 📚: https://causalpy--568.org.readthedocs.build/en/568/

Comment on lines 54 to 57
:param vs_prior_type : str or None, default=None
Type of variable selection prior: 'spike_and_slab', 'horseshoe', or None.
If None, uses standard normal priors.
:param vs_hyperparams : dict, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sphinx format and not numpy

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add that into AGENTS.md if it's not already there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sphinx format and not numpy

You got me. The doc strings were AI generated. Will fix.

Provides continuous shrinkage with heavy tails, allowing strong signals
to escape shrinkage while weak signals are dampened:
β_j = τ · λ̃_j · β_j^raw
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to add maths in here for nice rendering in the API docs

Signed-off-by: Nathaniel <[email protected]>
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
@codecov
Copy link

codecov bot commented Nov 20, 2025

Codecov Report

❌ Patch coverage is 94.17808% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.31%. Comparing base (2d6bba7) to head (18da6c4).

Files with missing lines Patch % Lines
causalpy/variable_selection_priors.py 89.34% 6 Missing and 7 partials ⚠️
causalpy/pymc_models.py 90.47% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #568      +/-   ##
==========================================
+ Coverage   93.27%   93.31%   +0.04%     
==========================================
  Files          37       39       +2     
  Lines        5632     5911     +279     
  Branches      367      386      +19     
==========================================
+ Hits         5253     5516     +263     
- Misses        248      256       +8     
- Partials      131      139       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
@NathanielF NathanielF marked this pull request as ready for review November 22, 2025 08:22
@NathanielF
Copy link
Contributor Author

Marking this one as ready for review. There is still some work to be done on the notebook illustrating the functionality. But i think there is enough here that's it worth flagging the architecture choices for discussion. I've made the variable selection priors available as a module. Currently, just integrated with the IV class, but in principle can be dropped into all regression based modules with coefficients. The pattern simply requires an if-else block to be used in e.g. the propensity score model, linear regression model etc....

What do you guys think?

Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
@juanitorduz
Copy link
Collaborator

Great @NathanielF ! I think the notebooks need a bit more storyline and explanation 🙏

@NathanielF
Copy link
Contributor Author

Cool, thanks @juanitorduz . I can take another pass at it this weekend.

@drbenvincent drbenvincent added the enhancement New feature or request label Dec 5, 2025
@cursor
Copy link

cursor bot commented Dec 9, 2025

PR Summary

Introduces reusable variable selection priors and integrates them into instrumental variables (IV) regression, with an option for binary treatments.

  • New causalpy/variable_selection_priors.py: spike-and-slab and horseshoe priors via VariableSelectionPrior, helpers for inclusion probabilities and shrinkage; exported in __init__.py
  • IV experiment/model updates: InstrumentalVariable and InstrumentalVariableRegression accept vs_prior_type, vs_hyperparams, and binary_treatment; outcome/treatment beta can use VS priors; adds binary-treatment likelihood (Bernoulli) with correlated latent errors (rho) and adjusted default priors
  • Propensity score outcome model: adds spline_knots parameter and uses it to size B-splines
  • Tests: new integration tests for IV with binary treatment and VS priors; unit tests for prior factories
  • Docs: adds iv_vs_priors.ipynb to IV toctree; updates interrogate badge

Written by Cursor Bugbot for commit 18da6c4. This will update automatically on new commits. Configure here.

Signed-off-by: Nathaniel <[email protected]>
the assumption of a simple IV experiment.
The coefficients should be interpreted appropriately."""
We will use the multivariate normal likelihood
for continuous treatment."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Validation warning ignores binary_treatment flag setting

The input_validation method checks if the treatment variable has more than 2 unique values and warns that "We will use the multivariate normal likelihood for continuous treatment." However, this warning doesn't account for the new binary_treatment parameter. If a user sets binary_treatment=True while having continuous treatment data, the warning incorrectly suggests MVN will be used, when actually the Bernoulli likelihood will be applied (which would fail on non-binary data). The validation needs to cross-check the actual data against the self.binary_treatment flag.

Fix in Cursor Fix in Web

Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
@NathanielF NathanielF closed this Dec 9, 2025
@NathanielF NathanielF reopened this Dec 9, 2025
@NathanielF
Copy link
Contributor Author

Great @NathanielF ! I think the notebooks need a bit more storyline and explanation 🙏

Took another pass @juanitorduz , should be more friendly now

@drbenvincent
Copy link
Collaborator

I'll attempt to review this week before I down tools for winter break. FYI, I was not planning on another CausalPy release in 2025. I think there is a lot in the pipeline so that we can start 2026 with some nice major feature releases

@@ -0,0 +1,2603 @@
{
Copy link
Collaborator

@juanitorduz juanitorduz Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #1.    def inv_logit(z):

We could use https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.expit.html right?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used expit

@@ -0,0 +1,2603 @@
{
Copy link
Collaborator

@juanitorduz juanitorduz Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #82.    data

maybse use data.info() or data.describe() ? There are too many columns (or keep both .head() and .info()?)


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swapped to data.info()

@@ -0,0 +1,2603 @@
{
Copy link
Collaborator

@juanitorduz juanitorduz Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some math formulas can help explain this: after reading this paragraph, I do not understand what temperature actually does (maybe use a tip box for these math formulas?)


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added much more extensive write up to the maths

@@ -0,0 +1,2603 @@
{
Copy link
Collaborator

@juanitorduz juanitorduz Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add titles to each of these subplots?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,2603 @@
{
Copy link
Collaborator

@juanitorduz juanitorduz Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #10.        # "mp_ctx": "spawn",

remove commented code # "mp_ctx": "spawn",?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@@ -0,0 +1,2603 @@
{
Copy link
Collaborator

@juanitorduz juanitorduz Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe plot these into different axes (one on top of other?) and use ref_val=3?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,2603 @@
{
Copy link
Collaborator

@juanitorduz juanitorduz Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latex typo tau_0?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed typo

@@ -0,0 +1,2603 @@
{
Copy link
Collaborator

@juanitorduz juanitorduz Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again: add titles subplots


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added titles

@@ -0,0 +1,2603 @@
{
Copy link
Collaborator

@juanitorduz juanitorduz Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: split the plots as suggested in the case above


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split the plot. The only thing i liked about the overlap plot was it was a bit clearer how tighter the inference was with vs priors, but you're right they're a bit ugly.

@@ -0,0 +1,2603 @@
{
Copy link
Collaborator

@juanitorduz juanitorduz Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here (overlapping plots look confusing)


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Collaborator

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome @NathanielF ! Just some minor comments :)

Signed-off-by: Nathaniel <[email protected]>
@NathanielF
Copy link
Contributor Author

On the cholesky decomposition @juanitorduz , i re-parameterised again in the binary case. We refactored the joint likelihood into a Conditional Likelihood.

Key Changes:

  • Elimination of Cholesky Decomposition: We no longer draw from a multivariate distribution. Instead, we draw the latent treatment error ($V$) and use it to adjust the mean of the outcome equation. We have a conditional formulation now.

Introduction of the Correction Term: We added expected_U = rho * (sigma_U / sigma_V) * V to the outcome mean. This term "soaks up" the variation in the outcome that is correlated with the treatment assignment

Logistic Error Parameterization: To improve sampling stability and match the invlogit link function, we parameterized $V$ using Inverse Transform Sampling from a Uniform distribution into a Standard Logistic distribution.

cursor etc was pushing me towards an inverse probit formulation, but i just found it much more brittle than the logit.

Signed-off-by: Nathaniel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants