Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache managed Python archive downloads #12175

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

konstin
Copy link
Member

@konstin konstin commented Mar 14, 2025

Part of #11834

Split the download-and-extract of manged Python interpreters into a download to the cache and an extract-and-install phase. This allows quickly re-installing Python interpreters when they are cached, and caching Python installation in tests and CI (follow-up PR).

Ideally, we would still have a combined download-and-extract step if the interpreter is available locally, but I couldn't figure out how to tee the stream in reasonable complexity. I nearly succeeded going through a futures Stream, but I couldn't figure out how to pass the second writer to the Stream::then FnMut in a way the borrow checker would accept.

Locally for me, cargo test -p uv -- python_install goes from 43s to 7s when setting UV_PYTHON_CACHE_DIR.

Part of #11834

Split the download-and-extract of manged Python interpreters into a download to the cache and an extract-and-install phase. This allows quickly re-installing Python interpreters when they are cached, and caching Python installation in tests and CI (follow-up PR).

Ideally, we would still have a combined download-and-extract step if the interpreter is available locally, but I couldn't figure out how to tee the stream in reasonable complexity. I nearly succeeded going through a futures `Stream`, but I couldn't figure out how to pass the second writer to the `Stream::then` `FnMut` in a way the borrow checker would accept.

Locally for me, `cargo test -p uv -- python_install` goes from 43s to 7s when setting `UV_PYTHON_CACHE_DIR`.
@konstin konstin added the enhancement New feature or improvement to existing functionality label Mar 14, 2025
@konstin konstin marked this pull request as ready for review March 19, 2025 14:47
@konstin konstin requested a review from zanieb March 19, 2025 14:48
@konstin konstin temporarily deployed to uv-test-publish March 19, 2025 14:49 — with GitHub Actions Inactive
@@ -975,6 +975,8 @@ pub enum CacheBucket {
Builds,
/// Reusable virtual environments used to invoke Python tools.
Environments,
/// Download of Python Build Standalone or PyPy, still archived.
PythonBuilds,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit surprising to call it PythonBuilds since we're not performing a build there? I'd prefer Python or PythonVersions

Comment on lines +563 to +564
// We improve compatibility by using neither the URL-encoded `%2B` nor the `+` it decodes
// to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, compatibility with file systems? If so, can you add that for clarity? I wasn't sure what we were trying to be compatible with here.

Ok(())
}

/// Extract a downloaded Python interpreter archive into a (temporary) directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this into any target directory? Why temporary here?

@zanieb
Copy link
Member

zanieb commented Mar 19, 2025

@Gankra could you give this a look too? I'm curious if you have thoughts on the streaming question.

@konstin I'm still not sure of the benefits of storing the compressed archives instead of an unpacked one and tweaking the install logic. Immediately unpacking would solve the streaming problem as well as the delayed hash check (which feels awkward here). It'd also help with the automatic install user experience (ref #12122 (comment)) — I think writing a marker file isn't unreasonable but it's harder to reason about when viewing the directories and may be annoying when we start bundling Python distributions into "virtual" environments.

Copy link
Contributor

@Gankra Gankra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: stream teeing -- hard to imagine what exactly is up with the borrowchecks without using it in anger but my experiences here are similarly dire frustration so I wouldn't dwell too hard on it, especially since the wins will be marginal compared to how big of a win you've got already.

@konstin
Copy link
Member Author

konstin commented Mar 26, 2025

Could we only have one unpacked tree instead of two unpacked trees, e.g. with (out of band) tracking whether a Python interpreter as installed? I was targeting fixing the test running with this change, so tackling our Python installation behavior in general would change the scope quite a bit. Moving more features such as decompression out of the test path would take test coverage of an important path in the test setup (vs. just the download, which is already a well-tested logic), so I'd prefer to not cache too much outside the case. We could also go the other way and decrease the scope of this PR by only acting on the environment variable that we only use in testing and only then use the two-step split download first, decompress-and-installer later.

@zanieb
Copy link
Member

zanieb commented Mar 26, 2025

Could we only have one unpacked tree instead of two unpacked trees,

Are you considering one unpacked cache tree and another hard or soft linked tree as two unpacked trees?

Moving more features such as decompression out of the test path would take test coverage of an important path in the test setup (vs. just the download, which is already a well-tested logic)

I think we'll want at least one test case that does not use this new cache.

@konstin
Copy link
Member Author

konstin commented Mar 27, 2025

Could we only have one unpacked tree instead of two unpacked trees,

Are you considering one unpacked cache tree and another hard or soft linked tree as two unpacked trees?

Sounds great, but is that this change or a different PR, i.e. would that behavior change fix the download behavior?

Moving more features such as decompression out of the test path would take test coverage of an important path in the test setup (vs. just the download, which is already a well-tested logic)

I think we'll want at least one test case that does not use this new cache.

Sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement to existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants