-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache managed Python archive downloads #12175
base: main
Are you sure you want to change the base?
Conversation
Part of #11834 Split the download-and-extract of manged Python interpreters into a download to the cache and an extract-and-install phase. This allows quickly re-installing Python interpreters when they are cached, and caching Python installation in tests and CI (follow-up PR). Ideally, we would still have a combined download-and-extract step if the interpreter is available locally, but I couldn't figure out how to tee the stream in reasonable complexity. I nearly succeeded going through a futures `Stream`, but I couldn't figure out how to pass the second writer to the `Stream::then` `FnMut` in a way the borrow checker would accept. Locally for me, `cargo test -p uv -- python_install` goes from 43s to 7s when setting `UV_PYTHON_CACHE_DIR`.
@@ -975,6 +975,8 @@ pub enum CacheBucket { | |||
Builds, | |||
/// Reusable virtual environments used to invoke Python tools. | |||
Environments, | |||
/// Download of Python Build Standalone or PyPy, still archived. | |||
PythonBuilds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit surprising to call it PythonBuilds
since we're not performing a build there? I'd prefer Python
or PythonVersions
// We improve compatibility by using neither the URL-encoded `%2B` nor the `+` it decodes | ||
// to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in, compatibility with file systems? If so, can you add that for clarity? I wasn't sure what we were trying to be compatible with here.
Ok(()) | ||
} | ||
|
||
/// Extract a downloaded Python interpreter archive into a (temporary) directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this into any target directory? Why temporary here?
@Gankra could you give this a look too? I'm curious if you have thoughts on the streaming question. @konstin I'm still not sure of the benefits of storing the compressed archives instead of an unpacked one and tweaking the install logic. Immediately unpacking would solve the streaming problem as well as the delayed hash check (which feels awkward here). It'd also help with the automatic install user experience (ref #12122 (comment)) — I think writing a marker file isn't unreasonable but it's harder to reason about when viewing the directories and may be annoying when we start bundling Python distributions into "virtual" environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: stream teeing -- hard to imagine what exactly is up with the borrowchecks without using it in anger but my experiences here are similarly dire frustration so I wouldn't dwell too hard on it, especially since the wins will be marginal compared to how big of a win you've got already.
Could we only have one unpacked tree instead of two unpacked trees, e.g. with (out of band) tracking whether a Python interpreter as installed? I was targeting fixing the test running with this change, so tackling our Python installation behavior in general would change the scope quite a bit. Moving more features such as decompression out of the test path would take test coverage of an important path in the test setup (vs. just the download, which is already a well-tested logic), so I'd prefer to not cache too much outside the case. We could also go the other way and decrease the scope of this PR by only acting on the environment variable that we only use in testing and only then use the two-step split download first, decompress-and-installer later. |
Are you considering one unpacked cache tree and another hard or soft linked tree as two unpacked trees?
I think we'll want at least one test case that does not use this new cache. |
Sounds great, but is that this change or a different PR, i.e. would that behavior change fix the download behavior?
Sounds good. |
Part of #11834
Split the download-and-extract of manged Python interpreters into a download to the cache and an extract-and-install phase. This allows quickly re-installing Python interpreters when they are cached, and caching Python installation in tests and CI (follow-up PR).
Ideally, we would still have a combined download-and-extract step if the interpreter is available locally, but I couldn't figure out how to tee the stream in reasonable complexity. I nearly succeeded going through a futures
Stream
, but I couldn't figure out how to pass the second writer to theStream::then
FnMut
in a way the borrow checker would accept.Locally for me,
cargo test -p uv -- python_install
goes from 43s to 7s when settingUV_PYTHON_CACHE_DIR
.