Adding a --info flag to output meta information of the model #808

rok-cesnovar · 2021-02-02T08:28:16Z

This is something I have seen users work around with various techniques, there are also packages that do this with regexes and the like.

I understand some of this can be accessed via the generated C++ models but that requires instantiating the model with the data and also requires tightly working with C++ which seems a bit of overkill.

Examples of use:

Posteriordb (see Extract model inputs directly from the compiler posteriordb#198 and Extract model inputs directly from the compiler posteriordb#198)
stanbreaker by @jtimonen (parses the transformed mir to get out the information on the data/parameters https://github.com/jtimonen/stanbreaker/blob/main/R/code_analysis.R#L84)

This should not be to difficult to do properly in stanc3. As you can see in the posteriordb issue @mandel has already made a branch with almost everything we need.

Meta data we could output and use:

list of included files
list of input data
list of transformed parameters and parameters
list of generated quantities
dimensionality of all data & parameters
(optional) does it use reduce_sum/map_rect

There may be other ideas. The output format should probably be pretty printed JSON as its both human readable and nice to work with in other scripts/languages.

@mandel would you be interested in making a PR of your branch?

jtimonen · 2021-02-02T09:03:32Z

An additional wish would be list of user-defined functions and their arguments if that is not too difficult to do. My ultimate goal would be to create a tool that can take an rstantools-based package like rstanarm, which is written with many includes, conditions, etc, and given the input data, turn it into minimal stan code which is much more readable. So this would require removing blocks and loops that will never be accessed, user-defined functions that will not be used, parameters which have dimension zero etc. I tried to do that once but then it was too much work to parse everything myself but it could be possible if stanc3 had a feature like this.

mitzimorris · 2021-02-02T13:55:59Z

need to capture type for data and parameters - generated quantities variables can be int as well as real.

list of input data

does this mean the input data variable declarations?

list of included files

by this you mean all includes in the program? why?

rok-cesnovar · 2021-02-02T14:17:02Z

does this mean the input data variable declarations?

Yes, all data variables and their types excluding transformed data variables (i wrote input data to differentiate from transf. data)

why?

so we can easily and also programtically check a models dependecies. This can work for cases like taking a model out of a huge database of models and also enable for interfaces to check if any of the included files has changes since last compile.

seantalts · 2021-02-02T14:18:31Z

+1, I think it's a great idea to expose more metadata about the model. My only input would be that we already expose some of it here: https://github.com/stan-dev/stanc3/blob/master/src/stan_math_backend/Cpp_Json.ml#L73 so we could try to keep that in mind and integrate that with whatever we come up with. It gives the model class a method you can call to get a JSON dump of parameter names and dimensions, I believe. We should definitely expose that on the command line interface :)

I understand some of this can be accessed via the generated C++ models but that requires instantiating the model with the data and also requires tightly working with C++ which seems a bit of overkill.

I think you might have to instantiate the model for the dimensionality of the parameters to be printed out, at least if you want actual numbers in there (which is what RStan et al require from the Cpp_Json stuff above). We should be able to print out whatever the definitions were (eg array[N] real x) without instantiating, though, maybe that's all that is needed.

rok-cesnovar · 2021-02-02T14:29:38Z

Yes for actual sizes you need the data of course, except for where the size is defined with a literal. I meant in the sense of number of dimensions. So print if its a 1D array, 2D array, etc. For matrices/vectors/scalars types the dimensionality info is redundant.

thanks for the cpp_json pointer.

mandel · 2021-02-02T17:11:00Z

I will be happy to help for that. The format was originally inspired by PosteriorDB.

The option--info option currently generates a JSON object a field inputs, parameters, transformed parameters, and generated quantities containing a dictionary where each entry corresponds to a variable in respectively the data, parameters, transformed parameters, and generated quantites blocks. To each variable is associated an object with two fields:

type: the base type of the variable ("int" or "real").
dimensions: the number of dimensions (0 for a scalar, 1 for a vector or row vector, etc.).

For example on https://github.com/stan-dev/posteriordb/blob/master/posterior_database/models/stan/hmm_drive_0.stan the generated json is

{ "inputs": { "K": { "type": "int", "dimensions": 0},
              "N": { "type": "int", "dimensions": 0},
              "u": { "type": "real", "dimensions": 1},
              "v": { "type": "real", "dimensions": 1},
              "alpha": { "type": "real", "dimensions": 2} },
  "parameters": { "theta1": { "type": "real", "dimensions": 1},
                  "theta2": { "type": "real", "dimensions": 1},
                  "phi": { "type": "real", "dimensions": 1},
                  "lambda": { "type": "real", "dimensions": 1} }
  "transformed parameters": { "theta": { "type": "real", "dimensions": 2},
                              ,
                               }
  "generated quantities": { "z_star": { "type": "int", "dimensions": 1},
                            "log_p_z_star": { "type": "real", "dimensions": 0},
                             } }

seantalts · 2021-02-08T16:13:31Z

#810 is merged and fixes many of these things. I think what's left:

list of included files
(optional) does it use reduce_sum/map_rect

Is that it?

rok-cesnovar · 2021-02-08T16:15:14Z

I think that is what came up so far indeed.

seantalts · 2021-02-08T16:18:50Z

A general way to address "does it use reduce_sum/map_rect" might be "produce a list of all named functions used" which I think would be fun and might even help us check test model coverage or help with dependencies later in some future Stan package system...

rok-cesnovar · 2021-02-08T16:22:30Z

Nice. A list of named functions is even better.

mandel mentioned this issue Feb 2, 2021

Adding a --info flag to output meta information of the model #810

Merged

mitzimorris mentioned this issue Feb 5, 2021

int variables are float stan-dev/cmdstanpy#310

Closed

mandel mentioned this issue Feb 11, 2021

The --info option produces a list of used functions and distributions #813

Merged

WardBrian mentioned this issue Sep 13, 2021

Add included files to --info #965

Merged

2 tasks

rok-cesnovar closed this as completed in #965 Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a --info flag to output meta information of the model #808

Adding a --info flag to output meta information of the model #808

rok-cesnovar commented Feb 2, 2021 •

edited

Loading

jtimonen commented Feb 2, 2021 •

edited

Loading

mitzimorris commented Feb 2, 2021 •

edited

Loading

rok-cesnovar commented Feb 2, 2021

seantalts commented Feb 2, 2021 •

edited

Loading

rok-cesnovar commented Feb 2, 2021

mandel commented Feb 2, 2021

seantalts commented Feb 8, 2021

rok-cesnovar commented Feb 8, 2021

seantalts commented Feb 8, 2021 •

edited

Loading

rok-cesnovar commented Feb 8, 2021

Adding a --info flag to output meta information of the model #808

Adding a --info flag to output meta information of the model #808

Comments

rok-cesnovar commented Feb 2, 2021 • edited Loading

jtimonen commented Feb 2, 2021 • edited Loading

mitzimorris commented Feb 2, 2021 • edited Loading

rok-cesnovar commented Feb 2, 2021

seantalts commented Feb 2, 2021 • edited Loading

rok-cesnovar commented Feb 2, 2021

mandel commented Feb 2, 2021

seantalts commented Feb 8, 2021

rok-cesnovar commented Feb 8, 2021

seantalts commented Feb 8, 2021 • edited Loading

rok-cesnovar commented Feb 8, 2021

rok-cesnovar commented Feb 2, 2021 •

edited

Loading

jtimonen commented Feb 2, 2021 •

edited

Loading

mitzimorris commented Feb 2, 2021 •

edited

Loading

seantalts commented Feb 2, 2021 •

edited

Loading

seantalts commented Feb 8, 2021 •

edited

Loading