Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add guaranteed-reproducible PRNGs to rand? #1588

Open
dhardy opened this issue Feb 14, 2025 · 9 comments
Open

Add guaranteed-reproducible PRNGs to rand? #1588

dhardy opened this issue Feb 14, 2025 · 9 comments
Labels
E-question Participation: opinions wanted

Comments

@dhardy
Copy link
Member

dhardy commented Feb 14, 2025

This question came up recently regarding a possible adoption to libstd (read from here), but I'm not sure we ever really asked the question of rand.

StdRng and SmallRng are deterministic but not reproducible (and in the latter case also not portable). Should we add a PRNG with guaranteed reproducibility as a new item under rand::rngs?

We already have five PRNGs available in rand if you count the ChaCha variants:

  • ChaCha8Rng, ChaCha12Rng, ChaCha20Rng
  • Xoshiro128PlusPlus, Xoshiro256PlusPlus

I'm not sure if we should ever add a guaranteed-reproducible ChaCha PRNG in rand since if we ever wanted to change the generator behind ThreadRng it would add dependencies. Given how long we've been using ChaCha in this role this may be less of an issue now.

The Xoshiro variants are more acceptable (if only because they require a lot less code; both are directly implemented in rand), though selecting one of these is likely sufficient, e.g. rang::rngs::Xoshiro256PlusPlus.

CC @hanna-kruppe @joshtriplett in case of interest

@dhardy dhardy added the E-question Participation: opinions wanted label Feb 14, 2025
@benjamin-lieser
Copy link
Member

If you want guaranteed reproducibility can't you just use the named PRNG? Maybe I am misunderstanding the question.

@dhardy
Copy link
Member Author

dhardy commented Feb 14, 2025

Yes — except that none of those named PRNGs are currently publicly export from rand.

Motivation is partially convenience and partially to make it more obvious how users may set up a reproducible PRNG (currently another crate must be added as a dependency).

@benjamin-lieser
Copy link
Member

Ah true, I remember having to do this.

I would say exporting Xoshiro256PlusPlus would be a good idea, also under this name.

@newpavlov
Copy link
Member

As argued in the linked issue, I don't think we need it and we should recommend use of a concrete PRNG crate (we could reference them in StdRng/SmallRng docs).

@hanna-kruppe
Copy link

When I'm sufficiently worried about long-term reproducibility that I'd opt for a generator with such a guarantee, I generally wouldn't be satisfied if the guarantee only covered RngCore methods or something like that. What I care about is that my program overall remains reproducible, which means e.g. any sampling Rng methods my program uses (and the trait impls backing them) can't have value-breaking changes either. Even if rand was willing to guarantee that for a larger subset of its APIs, it's very difficult for me as a user to ensure that I'm only using the guaranteed-stable subset. Depending directly on a specific rand_foo PRNG crate only solves this if you can make do with only rand_core::RngCore and avoid depending on rand entirely, but that's rare in my experience.

So I think rand, as a general-purpose crate that has good reasons to make value-breaking changes from time to time, is not in a good position to try and address the need for reproducibility. Offering it only for the simple cases, but not for the other APIs that come along for the ride, will result in just as many people being mistaken about whether their rand-using program will be reproducible with future releases of rand. That's not helping anyone.

@dhardy
Copy link
Member Author

dhardy commented Feb 14, 2025

So I think rand, as a general-purpose crate that has good reasons to make value-breaking changes from time to time, is not in a good position to try and address the need for reproducibility.

The same is true of any library offering a wide variety of random algorithms? The solution here is simple enough: use a fixed version of rand. We should not make value-breaking changes in patch releases (outside of security concerns, though this was never yet an issue).

@hanna-kruppe
Copy link

Using a fixed version is not great because it means I'll effectively be on my own with maintaining that code once upstream (quite reasonably) stops doing so. Whether value-breaking changes are made in patch releases or only in minor releases is immaterial -- eventually I'll have to choose between eating a value-breaking change or sticking with an unmaintained version of the library. This won't be an issue if my code stops being actively developed before upstream moves on, but in many cases I don't want to make assumptions about that. And if I end up having to vendor the library, I'd always prefer one that is as small and simple as possible for my specific use case over a library that does basically everything.

The only way around this is if a library is aligned with my priorities w.r.t. reproducibility: making a credible promise to avoid value-breaking changes, by only adding new APIs without changing the old ones (possibly deprecating them but ideally without the implication that they'll be removed eventually). Of course, that's undesirable for everyone who doesn't need long-term reproducibility and wants to get improvements automatically. But it's not inherently impossible for a maintainer to do that, if that's their priority.

@dhardy
Copy link
Member Author

dhardy commented Feb 14, 2025

eventually I'll have to choose between eating a value-breaking change or sticking with an unmaintained version of the library. [...] And if I end up having to vendor the library, I'd always prefer one that is as small and simple as possible for my specific use case over a library that does basically everything.

If you're talking about rand (not rand_distr), then unless you care about nightly features, there isn't much to maintain — about the only thing in rand v0.8 which "broke" is that gen will soon be a reserved keyword. As for bug fixes, v0.9 includes a couple of portability fixes and one single bug fix to IteratorRandom::choose_multiple_weighted for extremely small seeds (a value-breaking change, thus this could not be back-ported).

(I'm assuming you're not talking about maintenance of security — but even here nothing of note happened in the last four years, and if it did I expect that we would release a patch.)

So I don't buy your argument that rand is not a good choice if you care about long-term reproducibility.

@hanna-kruppe
Copy link

I don't know how easy or hard it would be for me to take over bugfix-only maintenance of a specific rand version. Since I'm not familiar with the code base or its history, determining that for myself would take non-trivial effort. I appreciate you sharing information about this now, but imagine if we weren't having this conversation and I'd just be looking at docs.rs/rand to make my decision. That part is just less daunting with a library that's less than, say, 1K lines of code.

In any case, if I'm happy to use a fixed version of a library then it doesn't matter if the library offers reproducibility guarantees across its releases (of course, consistent results across platforms still matter). If I'll be using rand 0.8.5 forever, then I'm not affected by value-breaking changes in later releases. Conversely, if I want to avoid pinning a specific version and instead keep updating rand, then I need reproducibility guarantees for all APIs that I'm using or might use by accident in the future, not just for the RngCore impls. That's what my first comment was about: to enable meaningful long-term reproducibility without version pinning, rand would have to make a much stronger commitment than just keeping some specific PRNG impls intact. I don't think rand can reasonably do that without unduly compromising on competing priorities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-question Participation: opinions wanted
Projects
None yet
Development

No branches or pull requests

4 participants