Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniformity feature: Support mutual information for cells of observed rows #250

Open
axch opened this issue Oct 9, 2015 · 2 comments
Open

Comments

@axch
Copy link
Contributor

axch commented Oct 9, 2015

Should be much faster than unobserved rows, because the cluster assignment in each model is assumed known. The current MI code path I am aware of only computes MI of columns for unobserved (new) rows

@gregory-marton
Copy link
Contributor

Being a little unclear on the math, I'm not sure what the use case looks like. Probability of dependence estimates the probability that mutual information between columns is nonzero? Similarity measures the mutual information between observed rows? What does this effectively mean for individual observed cells?

@riastradh-probcomp
Copy link
Contributor

The architecture is that we have an infinite exchangeable set of tuples of random variables {(A_r, B_r, C_r)}_r, and we approximate the posterior distribution given certain assignments A_0 = a_0, C_1 = c_1, &c. Currently we can only approximate mutual information for two random variables A_i, B_i in the same row i for which no values have been assigned. This issue is to allow approximating it for two random variables from a row that has been observed.

The architecture more specifically for Crosscat is that there are additional categorical variables {(L_r, M_r, N_r)}_r which we cannot observe, nor even whose number can we observe. Each Crosscat state is a sample from the distribution on latent variable numbers (views) and assignments (categories). Each model estimator evaluates a Monte Carlo integral (1) over samples of Crosscat states of some function of a single Crosscat state.

In this case, approximating mutual information of variables of an entirely unobserved row from a single Crosscat state means evaluating a Monte Carlo integral (2) over samples of category assignments of some mutual information estimator (itself a Monte Carlo integral (3) over samples of the posterior predictive distribution on the variables given the category assignments).

What @axch proposes is to do is to implement approximation of mutual information of variables of an observed row from a single Crosscat state, in which implementation the Monte Carlo integral (2) is replaced by a single evaluation of (3), given the fixed category assignments of that observed row in that Crosscat state, rather than a Monte Carlo integral over samples of category assignments of evaluations of (3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants