generate synthetic dataset given BG/NBD Model parameters? #717

seanreed1111 · 2024-06-03T13:40:46Z

seanreed1111
Jun 3, 2024

Hi, I am still trying to wrap my head around this great library.
My question is, can I use this library for synthetic customer transaction table generation? Essentially, I want to go in the opposite of the normal direction, that is, given distributions or MAPs for the params of a population model like BG/NBD, can I generate a synthetic transaction table. Any help appreciated!
Thanks.

Answered by ColtAllen

Jun 3, 2024

Hey @seanreed1111,

Are you interested in generating RFM data for modeling, or raw transaction data? (See the Quickstart for examples). Simulating RFM data is not supported yet for BG/NBD, but can be done for the Pareto/NBD model. Simulating raw transactions is not supported at all natively, but I did hack something together in the legacy lifetimes library last year for this very situation:

https://github.com/ColtAllen/marketing-case-study/blob/main/case-study.ipynb

First three cells in that notebook should cover your request. If this is something you think you'll be doing on a regular basis, please create an issue and we'll prioritize it accordingly.

View full answer

wd60622 · 2024-06-03T14:22:01Z

wd60622
Jun 3, 2024
Maintainer

Hi @seanreed1111,

Good question!
Since the models are based on PyMC, we should be able to sample the target random variable. However, only if the likelihood of the model can create random samples. One way that that happens is with use of pm.Potential in the model build.

Simple example

For a simple PyMC example,

y = np.array([0, 1, 2, 3, 4])

# Can take random samples of y 
with pm.Model() as model: 
    mu = pm.Normal("mu")
    sigma = pm.HalfNormal("sigma")
    pm.Normal("y", mu=mu, sigma=sigma, observed=y)

# Cannot take random samples of y
with pm.Model() as model: 
    mu = pm.Normal("mu")
    sigma = pm.HalfNormal("sigma")
    pm.Potential("y", pm.logp(pm.Normal.dist(mu=mu, sigma=sigma), y))

This would even work if the y depends on other variables like they tend to in regression.

This is where pm.sample_prior_predictive, pm.sample_posterior_predictive come in as they both return samples of the likelihood random variable. posterior_predictive if you want the target given a set of parameters

CLV Support

For the models in pymc-marketing/clv, this will be shown in the build_model methods. If there is a custom distribution for the likelihood, that would be defined here.

BG/NBG (BetaGeo) model uses the potential so that can not be sampled. However, the other models should be able to because they are built with pymc distribution or the custom distributions. However, the _data_setter method isn't used so this would likely have to be hacked together if on data sets that it wasn't built on. But in my mind, this seems possible.

@ColtAllen will have some more context as well

6 replies

seanreed1111 Jun 3, 2024
Author

thanks for your quick response. It was one of those things that seemed like it should be possible, but was not sure how to proceed. I am trying to make up transaction data for an exam I am constructing while simultaneously learning more about the library. Will take a look at the links you provided. Much appreciated!

ColtAllen Jun 3, 2024
Maintainer

Hey @seanreed1111,

Are you interested in generating RFM data for modeling, or raw transaction data? (See the Quickstart for examples). Simulating RFM data is not supported yet for BG/NBD, but can be done for the Pareto/NBD model. Simulating raw transactions is not supported at all natively, but I did hack something together in the legacy lifetimes library last year for this very situation:

https://github.com/ColtAllen/marketing-case-study/blob/main/case-study.ipynb

First three cells in that notebook should cover your request. If this is something you think you'll be doing on a regular basis, please create an issue and we'll prioritize it accordingly.

Answer selected by seanreed1111

seanreed1111 Jun 3, 2024
Author

ah it appears lifetimes does have this feature lifetimes.generate_data. So at least that solves my immediate problem. But, the learning will continue. Thanks again!

seanreed1111 Jun 3, 2024
Author

thanks @ColtAllen I think our answers crossed. No, I don't expect to do this on a regular basis, just tinkering and trying to wrap my head around how everything works.

ColtAllen Jun 3, 2024
Maintainer

Glad to help! FYI, nothing in lifetimes.generate_data has a random seed, and used as-is, all customers will also make their first purchase in the same time period, which is wildly unrealistic except perhaps for a new product launch. The code in my notebook varies the first purchase date for each customer.

seanreed1111 Jun 3, 2024
Author

nice! thanks @ColtAllen!

Also a general shoutout to all contributors, your docs are awesome!

Adding a note for posterity:
need pandas<=1.5.3 in the notebook https://github.com/ColtAllen/marketing-case-study/blob/main/case-study.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generate synthetic dataset given BG/NBD Model parameters? #717

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

generate synthetic dataset given BG/NBD Model parameters? #717

seanreed1111 Jun 3, 2024

Replies: 1 comment · 6 replies

wd60622 Jun 3, 2024 Maintainer

Simple example

CLV Support

seanreed1111 Jun 3, 2024 Author

ColtAllen Jun 3, 2024 Maintainer

seanreed1111 Jun 3, 2024 Author

seanreed1111 Jun 3, 2024 Author

ColtAllen Jun 3, 2024 Maintainer

seanreed1111 Jun 3, 2024 Author

seanreed1111
Jun 3, 2024

Replies: 1 comment 6 replies

wd60622
Jun 3, 2024
Maintainer

seanreed1111 Jun 3, 2024
Author

ColtAllen Jun 3, 2024
Maintainer

seanreed1111 Jun 3, 2024
Author

seanreed1111 Jun 3, 2024
Author

ColtAllen Jun 3, 2024
Maintainer

seanreed1111 Jun 3, 2024
Author