Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential sign of bug in confidence calculations #415

Open
curlette opened this issue May 14, 2016 · 4 comments
Open

Potential sign of bug in confidence calculations #415

curlette opened this issue May 14, 2016 · 4 comments

Comments

@curlette
Copy link
Contributor

During my analysis of the College Scorecard data I came across the following:

The plot below is of 500 simulated scores for a school whose tuition was inferred with 94% confidence (Everest Univ. Jacksonville).

It is noticeably somewhat bimodal, which we would expect to lower the confidence.

@fsaad @vkmvkmvkmvkm @raxraxraxraxrax @gregory-marton

collegescorecardanalysis-copy1_211_0

screenshot from 2016-05-14 01_26_52

@alxempirical
Copy link
Contributor

alxempirical commented May 14, 2016

Assuming you are using a crosscat metamodel, not gpmcc, I happened on a possible explanation for this behavior earlier this week.

TL;DR:

            # TODO: multistate impute doesn't exist yet
            # e,confidence = su.impute_and_confidence_multistate(M_c, X_L, X_D, Y, Q, n,
            #                                                    self.get_next_seed)

INFER draws 100 approximate posterior samples for an observed row by sampling from the category distributions ("cluster_model" in the crosscat source code nomenclature) for the latent categories assigned to the rows in the last ANALYZE iteration on each model. This category distribution is a gaussian, so univariate.

Then the confidence estimate makes an entirely new crosscat state from the posterior sample, trains it for 100 iterations, and returns the mean frequency of the maximum-likelihood category over those training iterations. Since that state is being trained on a sample from a gaussian, it's not surprising that the ML category has very high frequency. Essentially, the confidence-estimate code never gets to see the other mode of the posterior sample.

SIMULATE draws samples given observed-row conditions using the same code as INFER, but it draws them from all models in the generator unless you specify otherwise. So SIMULATEd samples can have multiple modes, they just come from different models. The confidence estimate in INFER (and the inference itself) is based only on the first model in the generator.

@curlette
Copy link
Contributor Author

Makes sense, thanks!

@alxempirical
Copy link
Contributor

If you don't mind, I think it would be good to keep this issue open. It looks like you have brought a serious bug to light.

@alxempirical alxempirical reopened this May 14, 2016
@alxempirical
Copy link
Contributor

alxempirical commented May 15, 2016

@curlette, can you send the bdb file to alx@<the rest of my github name>.com, please?

SIMULATE draws samples given observed-row conditions using the same code as INFER, but it draws them from all models in the generator unless you specify otherwise. So SIMULATEd samples can have multiple modes, they just come from different models. The confidence estimate in INFER (and the inference itself) is based only on the first model in the generator.

I misread the code in the link. The first model is used only for the confidence calculation. The imputation sampling is done over all models, so you would expect the two modes to appear in the samples generated by impute, which are then passed to continuous_imputation_confidence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants