Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transmit BayesDB stochasticity to crosscat engine #382

Conversation

alxempirical
Copy link
Contributor

At the moment, the seed passed to a BayesDB instance essentially goes unused. A metamodel is seeded during instantiation, and that's it. That makes it more difficult than it should be to run repeatable tests in a REPL. This PR corrects that by allowing the metamodel to create a new engine for each query. Since the only state in an engine is an RNG, the overhead of this change is minimal.

This change needs to take place in concert with probcomp/crosscat#88

The new interface expects bayeslite.metamodels.crosscat.CrosscatMetamodel to be instantiated with a callable which takes a seed argument. For backwards compatibility, it is still possible to pass a crosscat.EngineTemplate.EngineTemplate instance (or instances of subclasses like LocalEngine or MultiprocessingEngine.

CrosscatMetamodel now takes a crosscat engine class constructor instead
of an instance, so that the crosscat stochasticity can be controlled by
the bdb RNG services.

CrosscatMetamodel can still take an instance, for backward
compatibility.
…0160311-alxempirical-i-want-to-be-surprised-and-delighted-but-i-also-want-to-be-in-total-control
@alxempirical
Copy link
Contributor Author

Please let me know if I need to give more detail about why this is a valuable change.

@riastradh-probcomp
Copy link
Contributor

Sorry, just haven't gotten a round tuit yet. From what you said this sounds like a good idea, but I haven't taken a look at the actual changes.

@@ -240,34 +239,8 @@ class CrosscatMetamodel(metamodel.IBayesDBMetamodel):

"""

def __init__(self, crosscat, subsample=None, ccargs=None,
cckwargs=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cckwargs was so we could pass e.g. the subsample keyword to the underlying metamodel, even if we wanted to do multiprocessing. Can you help me understand how that works in the revised flow?

@alxempirical
Copy link
Contributor Author

Sorry, just haven't gotten a round tuit yet.

No worries. Thanks for the update.

`crosscat` can be a crosscat engine instance or constructor. Prefer the
latter case, as that way bayeslite stochasticity is passed to the
constructor `seed` argument when engines are constructed for crosscat
queries (which at the time of writing happens every time bayeslite makes a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the provided constructor will be used seems like info a caller would want to have. Please keep that?

@alxempirical
Copy link
Contributor Author

Can you help me understand how that works in the revised flow?

CrosscatMetamodel now takes a callable which should return a crosscat engine. See, e.g., crosscat.MultiprocessingEngine.MultiprocessingEngineFactoryFromPool. Arguments to the engine constructor can be included in a lambda if need be.

I am agnostic about whether to use ccargs/cckwargs vs an opaque lambda here. One is simpler to write, the other is simpler to interrogate during debugging.

@gregory-marton
Copy link
Contributor

for ccargs/cckwargs vs. lambda, I'm fine with lambda, but would suggest adding an example of that use in nearby documentation.

self._subsample = subsample
def __init__(self, crosscat, subsample=None, ccargs=None,
cckwargs=None):
if isinstance(crosscat, EngineTemplate):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more dynamic decisions on the basis of isinstance please, unless actually operating on a data structure that is sensibly a union type (e.g., a json tree).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already resolved in later commit.


>>> import bayeslite.metamodels.crosscat as cc
>>> from crosscat import MultiprocessingEngine as mpe
>>> import bayeslite.metamodels.crosscat as cc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dup import

>>> from bayeslite import bayesdb_register_metamodel as rg
>>> from bayeslite import bayesdb_open as bp
>>> from bayeslite import bayesdb_register_metamodel as register_metamodel
>>> from bayeslite import bayesdb_open
>>> pool = mpe.Pool(4)
>>> thelambda = lambda s: mpe.MultiprocessingEngine(seed=s, pool=pool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much more readable; thank you!

thelambda -> engine_factory perhaps?

@gregory-marton
Copy link
Contributor

See #392 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants