-
Notifications
You must be signed in to change notification settings - Fork 229
Compatibility with DynamicPPL 0.38 + InitContext #2676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: breaking
Are you sure you want to change the base?
Conversation
9658a3e
to
ed43a02
Compare
ed43a02
to
bf18516
Compare
bf18516
to
3a04643
Compare
# Get the initial values for this component sampler. | ||
initial_params_local = if initial_params === nothing | ||
nothing | ||
else | ||
DynamicPPL.subset(vi, varnames)[:] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was quite pleased with this discovery. Previously the initial params had to be subsetted to be the correct length for the conditioned model. That's not only a faff, but also I get a bit scared whenever there's direct VarInfo manipulation like this.
Now, if you use InitFromParams with a NamedTuple/Dict that has extra params, the extra params are just ignored. So no need to subset it at all, just pass it through directly!
# TODO(DPPL0.38/penelopeysm): This function should no longer be needed | ||
# once InitContext is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately set_namedtuple!
is used elsewhere in this file (though it won't appear in this diff) so we can't delete it (yet)
function DynamicPPL.tilde_assume!!( | ||
context::MHContext, right::Distribution, vn::VarName, vi::AbstractVarInfo | ||
) | ||
# Just defer to `SampleFromPrior`. | ||
retval = DynamicPPL.assume(rng, SampleFromPrior(), dist, vn, vi) | ||
return retval | ||
# Allow MH to sample new variables from the prior if it's not already present in the | ||
# VarInfo. | ||
dispatch_ctx = if haskey(vi, vn) | ||
DynamicPPL.DefaultContext() | ||
else | ||
DynamicPPL.InitContext(context.rng, DynamicPPL.InitFromPrior()) | ||
end | ||
return DynamicPPL.tilde_assume!!(dispatch_ctx, right, vn, vi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behaviour of SampleFromPrior
used to be: if the key is present, don't actually sample, and if it was absent, sample. This if/else replicates the old behaviour.
sampler::S | ||
varinfo::V | ||
evaluator::E | ||
resample::Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For pMCMC, this Boolean field essentially replaces the del flag. Instead of set_all_del
and unset_all_del
we construct new TracedModel
with this set to true and false respectively.
Turing.jl documentation for PR #2676 is available at: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## breaking #2676 +/- ##
===========================================
Coverage ? 57.89%
===========================================
Files ? 22
Lines ? 1387
Branches ? 0
===========================================
Hits ? 803
Misses ? 584
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
seed = if dist isa GeneralizedExtremeValue | ||
# GEV is prone to giving really wacky results that are quite | ||
# seed-dependent. | ||
StableRNG(469) | ||
else | ||
StableRNG(468) | ||
end | ||
chn = sample(seed, m(), HMC(0.05, 20), n_samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case in point:
julia> using Turing, StableRNGs
julia> dist = GeneralizedExtremeValue(0, 1, 0.5); @model m() = x ~ dist
m (generic function with 2 methods)
julia> mean(dist)
1.5449077018110322
julia> mean(sample(StableRNG(468), m(), HMC(0.05, 20), 10000; progress=false))
Mean
parameters mean
Symbol Float64
x 3.9024
julia> mean(sample(StableRNG(469), m(), HMC(0.05, 20), 10000; progress=false))
Mean
parameters mean
Symbol Float64
x 1.5868
For the record, 11 failing CI jobs is the expected number:
There is also the failing job caused by base Julia segfault (#2655), but that's on 1.10 so overlaps with the first category. |
0d42641
to
b0badc2
Compare
@mhauru, I haven't run CI against the latest revisions like removal of the del flag, but I think this might be meaty enough as it stands and also any adjustments arising from that PR (like renaming islinked) should be quite trivial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, some minor comments.
I'm wondering about how to merge this. Should be review the code here, but then hold off merging to breaking
before all the 0.38 compat fixes are in and a release of 0.38 is out, so all the temporary source
stuff etc. can go and we can see tests pass?
end | ||
DynamicPPL.NodeTrait(::MHContext) = DynamicPPL.IsLeaf() | ||
|
||
function DynamicPPL.tilde_assume!!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Doesn't MH just use init
to get initial values like any other sampler, and then evaluate logpdfs on proposed steps?
Also, this looks like dynamical dispatch, since the context type depends on haskey(vi, vn)
, which could be a performance issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale is in the comment right below this:
# Allow MH to sample new variables from the prior if it's not already present in the
# VarInfo.
This preserves the old behaviour of MH (see
Lines 416 to 422 in cabe73f
function DynamicPPL.assume( | |
rng::Random.AbstractRNG, spl::Sampler{<:MH}, dist::Distribution, vn::VarName, vi | |
) | |
# Just defer to `SampleFromPrior`. | |
retval = DynamicPPL.assume(rng, SampleFromPrior(), dist, vn, vi) | |
return retval | |
end |
It directly mimics the old code for SampleFromPrior: https://github.com/TuringLang/DynamicPPL.jl/blob/c1d9b61933a6251fbeac249e8ebf03631f1f25bf/src/context_implementations.jl#L131-L171 which includes the haskey(vi, vn)
check.
dynamic dispatch
I don't really understand what is so bad with this. (For what it's worth, I searched online and I can't find any actual explanation, so feel free to enlighten me.) I am just calling a different method, depending on the value of a Boolean which — sure — you can't know it for sure until runtime.
But if the 'not known until runtime' is the main problem, then it seems like literally any code that uses if haskey(vi, vn)
is already problematic — would it be any better or worse if I were to inline the definitions here?
if haskey(vi, vn)
# inline the definition of tilde_assume on DefaultContext here
else
# inline the definition of tilde_assume on Init here
end
Surely that's okay, right, because otherwise it seems incredibly limiting for programming if you can't even use conditionals. So I assume I'm missing something. Like, what's the difference between 'dynamic dispatch' and 'code that uses conditionals because values aren't known until runtime'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly how bad dynamic dispatch is either, and I wondered about the same question you are asking. Clearly some of it has to be okay. One difference to inlining is that without inlining the right method will have to be looked up from some method table at runtime. I'm not sure what exactly that causes, but maybe there are substantial costs there? Maybe those method tables are quite heavy data structures, compared to what the code itself is dealing with?
One the first point, I don't understand why MH needs to do this differently from every other sampler. I feel like whether to fall back to InitContext when evaluating logp for sampling purposes and hitting a tilde_assume!!
for an unknown variable should be a decision made on a higher level than a sampler's implementation. I get that this is not new behaviour in this PR, though it's clunkyness is exposed in a different way because we now need to create a whole new context for MH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dynamic dispatch
I searched a bit more and came across this: https://discourse.julialang.org/t/curious-about-the-internals-of-dynamic-dispatch/102888/8. In this example, there is some object a::A
but A
is an abstract type, so the compiler can't figure out which method f(a)
refers to ahead of time and this is only resolved at runtime.
But here the compiler can perfectly figure out which methods are being used, because both InitContext
and DefaultContext
are concrete types. My impression is that what's going on here is not dynamic dispatch — the dispatch itself is static: when I write tilde_assume(...)
the compiler has enough information to know which method is being referred to. Once haskey(vi, vn)
is evaluated, there is no more looking things up in method tables. I think the dynamic dispatch thing only happens if the types of the arguments to tilde_assume!!
are themselves not concrete.
The only downside from the compiler's point of view is that it can't aggressively cut out one branch ahead of time, because it can't figure out which branch it will go down.
Also, there would be issues with type stability if both tilde_assume!!
s return different types. But that's not a dispatch problem, that's just plain old type stability and that wouldn't be affected if the function definitions were inlined.
I don't understand why MH needs to do this differently from every other sampler
I don't know either. I am in principle not opposed to cutting this out and just using DefaultContext
, but I would probably prefer to not make this decision in this PR.
performance
If it's of any reassurance, I ran the following code on main plus this PR; main comes in at around 0.89 seconds, this PR at 0.83 seconds. (of course, I have no clue why or what made it better! 😄)
using Turing
J = 8
y = [28, 8, -3, 7, -1, 1, 18, 12]
sigma = [15, 10, 16, 11, 9, 11, 10, 18]
@model function eight_schools_centered(J, y, sigma)
mu ~ Normal(0, 5)
tau ~ truncated(Cauchy(0, 5); lower=0)
theta = Vector{Float64}(undef, J)
for i in 1:J
theta[i] ~ Normal(mu, tau)
y[i] ~ Normal(theta[i], sigma[i])
end
end
model_esc = eight_schools_centered(J, y, sigma)
@time sample(model_esc, MH(), 10000; progress=false);
trng = get_trace_local_rng_maybe(ctx.rng) | ||
resample = get_trace_local_resampled_maybe(true) | ||
|
||
dispatch_ctx = if ~haskey(vi, vn) || resample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like with MH, I wonder about dynamic dispatch here. Might be unavoidable and/or inconsequential in this case though.
return DynamicPPL.init_strategy(spl.sampler) | ||
end | ||
|
||
function AbstractMCMC.sample( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why these became necessary now. Is it something about the type hierarchy around Sampler
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, RepeatSampler
is weird, because it doesn't subtype InferenceAlgorithm
. So some of the code like default sample
methods has to be copied over for it. In the past, the methods used to be written for Union{Sampler{<:InferenceAlgorithm},RepeatSampler}
. I think that's worse because it was less flexible, you couldn't change one without affecting the behaviour of the other.
julia = "1.10" | ||
|
||
[sources] | ||
DynamicPPL = {url = "https://github.com/TuringLang/DynamicPPL.jl", rev = "breaking"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this comment just as a reminder that this needs to be removed before merging.
Optim = "429524aa-4258-5aef-a3af-852621145aeb" | ||
|
||
[sources] | ||
DynamicPPL = {url = "https://github.com/TuringLang/DynamicPPL.jl", rev = "breaking"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise this.
You still need to use the `initial_params` keyword argument to `sample`, but the allowed values are different. | ||
For almost all samplers in Turing.jl (except `Emcee`) this should now be a `DynamicPPL.AbstractInitStrategy`. | ||
|
||
TODO LINK TO DPPL DOCS WHEN THIS IS LIVE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise a reminder comment.
Personally, I'm not too fussed. I think we discussed this last time round and IIRC you said you'd prefer to keep |
6aed11b
to
6af6330
Compare
Okay, I'm quite happy with where CI on this PR has gotten to. There are a handful of residual failures, which are mostly to do with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we discussed this last time round and IIRC you said you'd prefer to keep breaking 'clean' and mergeable into main at any given time (apologies if I am misremembering). If that's the case, then we should keep this as the base branch for DPPL 0.38 fixes, until 0.38 is released.
You remember correctly, this would be my preference.
end | ||
# TODO(penelopeysm / DPPL 0.38) This is type piracy (!!!) The function | ||
# `_convert_initial_params` will be moved to Turing soon, and this piracy SHOULD be removed | ||
# in https://github.com/TuringLang/Turing.jl/pull/2689, PLEASE make sure it is! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a comment to track that this gets done before merging.
end | ||
DynamicPPL.NodeTrait(::MHContext) = DynamicPPL.IsLeaf() | ||
|
||
function DynamicPPL.tilde_assume!!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly how bad dynamic dispatch is either, and I wondered about the same question you are asking. Clearly some of it has to be okay. One difference to inlining is that without inlining the right method will have to be looked up from some method table at runtime. I'm not sure what exactly that causes, but maybe there are substantial costs there? Maybe those method tables are quite heavy data structures, compared to what the code itself is dealing with?
One the first point, I don't understand why MH needs to do this differently from every other sampler. I feel like whether to fall back to InitContext when evaluating logp for sampling purposes and hitting a tilde_assume!!
for an unknown variable should be a decision made on a higher level than a sampler's implementation. I get that this is not new behaviour in this PR, though it's clunkyness is exposed in a different way because we now need to create a whole new context for MH.
Co-authored-by: Markus Hauru <[email protected]>
It should be noted that due to the changes in DynamicPPL's
src/sampler.jl
, the results of running MCMC sampling on this branch will pretty much always differ from that on the main branch. Thus there is no (easy) way to test full reproducibility of MCMC results (we have to rely instead on statistics for converged chains).TODO:
Separate PRs:
use InitStrategy for optimisation as well
Note that the three pre-existing InitStrategies can be used directly with optimisation. However, to handle constraints properly, it seems necessary to introduce a new subtype of AbstractInitStrategy. I think this should be a separate PR because it's a fair bit of work.
fix docs for that argument, wherever it is (there's probably some in AbstractMCMC but it should probably be documented on the main site) EDIT: https://turinglang.org/docs/usage/sampling-options/#specifying-initial-parameters