Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow static covariates in BGNBDModel #1390

Merged
merged 35 commits into from
Feb 14, 2025

Conversation

PabloRoque
Copy link
Contributor

@PabloRoque PabloRoque commented Jan 16, 2025

Description

Allows static covariates in BetaGeoModel

NOTE: It seems there are convergence issues with the dropout-covariates-related params a|b. Related to similar observations by @juanitorduz here. As a consequence the last two assertions in test_distribution_method are a dubious hack.

Related Issue

Checklist

Modules affected

  • MMM
  • CLV
  • Customer Choice

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc-marketing--1390.org.readthedocs.build/en/1390/

Copy link

codecov bot commented Jan 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.65%. Comparing base (6ca3a30) to head (dc280d7).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1390      +/-   ##
==========================================
+ Coverage   92.59%   92.65%   +0.06%     
==========================================
  Files          52       52              
  Lines        6051     6103      +52     
==========================================
+ Hits         5603     5655      +52     
  Misses        448      448              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wd60622 wd60622 changed the title [DRAFT]: Allow static covariates in BGNBDModel Allow static covariates in BGNBDModel Jan 17, 2025
@ColtAllen
Copy link
Collaborator

Merging #1375 has created a merge conflict in clv.distributions.py, but shouldn't be hard to fix. Seems like this branch was created off of that one.

@ColtAllen ColtAllen added the enhancement New feature or request label Jan 18, 2025
@ColtAllen ColtAllen added this to the 0.12.0 milestone Jan 18, 2025
@ColtAllen
Copy link
Collaborator

The _extract_predictive_variables internal method for covariates can probably be moved into the Base CLV class to reduce repetition, because static covariates can be included in all CLV models.

@PabloRoque PabloRoque marked this pull request as ready for review January 22, 2025 18:18
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added the docs Improvements or additions to documentation label Feb 2, 2025
@PabloRoque
Copy link
Contributor Author

PabloRoque commented Feb 2, 2025

@ColtAllen

OK, the plot kind of thickened.

the poor convergence in distribution_new_customers needs more investigation and plotting of visuals in a notebook.

I added docs/notebooks/dev/clv/dev/bg_nbg_covariates_test_issues.ipynb. Findings:

  • We were assuming the tests to be equivalent to the one in ParetoNBD. But this should not be the case.
  • Whenever a = b, the Beta distribution is symmetric with E[X]=0.5.
  • Under this condition, the only effect of the introduction of covariates is the narrowing of the distribution of E[X] around 0.5.
  • I believe this has implications on the results from CLVTools. For them gamma2=gamma3, and thus if a0 = b0 you will have basically a fancy 50/50 coin toss discerning your dropout probability

Generally if a >b, then p increases, and vice-versa. The greater both values are, the narrower the distribution.

This was the whole thing!. I was not taking into account the properties of the Beta dist, and following the ParetoNBD implementation blindly. Working on the notebook, I thought I was going crazy. E["dropout"]~0.5 always, regardless of the covariates I was using. Note that in the test setup we have equal coefficients: dropout_coefficient_a=np.array([3.0]), dropout_coefficient_b=np.array([3.0]). This was meant to be the case from the beginning.

Note however some findings. We are using a|b = pm.Flat("a|b") in distribution_new_customer. If we are not to pass data to the function, we would be having issues because Flat and Beta don't get along too well.

@PabloRoque PabloRoque requested a review from ColtAllen February 2, 2025 11:32
@juanitorduz
Copy link
Collaborator

hey! are we missing something here? any blockers :) ?

@ColtAllen
Copy link
Collaborator

hey! are we missing something here? any blockers :) ?

Just haven't found time to look at it yet 🤔 Reviewing now.

@PabloRoque
Copy link
Contributor Author

hey! are we missing something here? any blockers :) ?

It is good to ship on my side, and would allow to work on a clean branch on the addition of covariates to the ModifiedBetaGeoModel

@ColtAllen requested changes a couple of weeks ago related to some tests. I believe I've addressed all the worries he expressed (see dev notebook), but would be good to have the green light on his side.

@ColtAllen
Copy link
Collaborator

Note however some findings. We are using a|b = pm.Flat("a|b") in distribution_new_customer. If we are not to pass data to the function, we would be having issues because Flat and Beta don't get along too well.

distribution_new_customer is just boilerplate for sampling from the latent Beta dropout_rate and Gamma purchase_rate distributions. Choice of distributions for a and b are arbitrary because the fitted posteriors from self.fit_result are being used for those parameters.


def test_expectation_method(self):
"""Test that predictive methods work with covariates"""
# Higher covariates with positive coefficients -> higher change of death and vice-versa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do your experiments in the dev notebook confirm this? This seems copy/pasted from the equivalent test for ParetoNBD model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified accordingly to the findings in the notebook

Copy link
Collaborator

@ColtAllen ColtAllen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good! Just some clarifying questions regarding notebook experiments and request to add an additional test condition, and I think this will be good to merge!

@PabloRoque PabloRoque requested a review from ColtAllen February 14, 2025 15:40
Copy link
Collaborator

@ColtAllen ColtAllen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Let's continue to test prior configs w/ covariates and update the docs with our findings.

@juanitorduz juanitorduz merged commit 5f59919 into pymc-labs:main Feb 14, 2025
20 checks passed
@juanitorduz
Copy link
Collaborator

juanitorduz commented Feb 14, 2025

Thank you @PabloRoque and @ColtAllen 🙌😎🚀

@PabloRoque PabloRoque deleted the BGNBD-static-covar branch February 15, 2025 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLV docs Improvements or additions to documentation enhancement New feature or request tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants