-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a user-facing way to discover how many models a pre-analyzed file has? #246
Comments
I did at some point intend to document some parts of bayeslite's schema to allow for some introspection. We could invent a pragma for the purpose too:
('Precision' is the first word that came to mind which might be taken to mean an estimate of the expected error, or might be taken to be an estimate of the population variance, &c.) |
In a separate conversation with Vikash, he wanted me to, instead of saying "BayesDB says the inferred value is 12 with 70% confidence" to say something like "BayesDB, on a population with 20 observations, 32 models run for 1200 iterations, inferred a value of 12 with 70% confidence." I take it this ticket would give me the metadata to get the phrasing right. Should charts automatically be tagged with this metadata? |
Yes! |
One annoyance: there is nothing enforcing that all models are run for the same number of iterations, even though this is a convention. In the interest of maximum disclosure, we would need to invent a scheme that summarizes the amount of analysis done even when it is heterogeneous; also in the presence of streaming in more data. |
We could take the average. (We could also multiply them!) |
recipes.analysis_status() shows this info in the notebook (by returning a df with counts of iterations and number of models that have that count of iterations). There is a lower-level function per_model_analysis_status() that returns a df with each model number and its iteration count, for which analysis_status is a .value_counts(). I'm not sure the extent to which this counts as "user facing" because it's not in bayeslite (bdbcontrib), and is not part of the language, but is still just a python function. This also doesn't really address the questions of expected precision or robustness, because of course different numbers of models and iterations will be good enough for different datasets, queries, and requirements of the answer. But that's a little bit of an open problem, isn't it? |
Use case: knowing roughly what kind of result robustness to expect
Use case: knowing what model ranges to specify in queries
Starting with
select * from bayesdb_generator_model
works but seems a little internal. Or do we intend to document (parts of) Bayeslite's schema to enable this sort of introspection?The text was updated successfully, but these errors were encountered: