Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uv pip resolver should be able to backtrack on 403 #5260

Open
paveldikov opened this issue Jul 21, 2024 · 18 comments · May be fixed by #12255
Open

uv pip resolver should be able to backtrack on 403 #5260

paveldikov opened this issue Jul 21, 2024 · 18 comments · May be fixed by #12255
Assignees
Labels
needs-decision Undecided if this should be done

Comments

@paveldikov
Copy link
Contributor

Many organisations employ in-band scanner tools on their internal PyPI mirrors, with the aim of preventing the ingress of compromised/non-compliant dependencies. Usually such tools will make their blocking decisions known by way of a HTTP 403 response code at download time.

One of the key pain-points with this approach is that, as new versions of packages come out, they will almost inevitably 403-upon-first-download, as they are yet to be scanned and cleared.

Currently the resolver completely gives up when encountering a 403 code, perhaps on the assumption that this is the usual 'permission denied' meaning, as opposed to a more granular 'not this one, please' meaning. This behaviour is not unique to uv; pip also does this, but uv probably hits this worse than pip thanks to supporting universal resolution etc. Regardless, it is a pretty substantial negative impact to developer experience.

I think there should be a resolver option to allow backtracking when 403 codes are encountered, so that older (but scanned + cleared) versions will be considered.

@charliermarsh
Copy link
Member

I believe we do continue and backtrack on 403:

|| err.status() == Some(StatusCode::FORBIDDEN)
. Or are you referring to a 403 on the archive download, and not the simple HTTP responses?

@paveldikov
Copy link
Contributor Author

Yes, 403 on the wheel/sdist download.

@charliermarsh charliermarsh added the needs-decision Undecided if this should be done label Jul 21, 2024
@notatallshaw
Copy link
Collaborator

Btw, has this request ever been reported to pip? I had a search through the issue tracker and the only issues related to 403 were all SSL issues, which it makes sense not to retry on.

And do you know the mirror software your organization is using? Back when I used to use artifactory the error I got for this sort of situation was a timeout, usually for large wheels as the mirror was still downloading the wheel and wouldn't start providing data to the client until it had finished.

It may be worth making a request to the mirror software to provide a 502, 503, or 504 error rather than a 403 error, as they are general considered retryable, where as, in general, 403 is not usually considered retryable.

In fact, on a 403 response, the spec explicitly says if credentials were provided that the client SHOULD NOT automatically repeat the request with the same credentials.

@zanieb
Copy link
Member

zanieb commented Jul 22, 2024

I think the request is is not to retry the request but to try another package version. I think if don't get a 403 when listing packages, it's reasonable for us to try another version after a 403 for a specific archive.

@notatallshaw
Copy link
Collaborator

ah, that makes sense, I guess I hadn't fully groked the scenario.

@paveldikov
Copy link
Contributor Author

That's exactly it. 403 on archive download should definitely not be re-tried, but it may make sense to move along the next archive on the list.

@charliermarsh
Copy link
Member

Are you expecting that it would try all distributions within a version? Or that it would move on to the next version if the archive returned a 403?

@paveldikov
Copy link
Contributor Author

Interesting one. I was thinking 'next version', but this is primarily because of my specific circumstances. Not sure if they are safe to make for the general case:

  • since the policy engine in this case is a scanner, it would likely make the same decision for other distributions of the same version
  • rather not waste time downloading incompatible distributions (not sure if the compatibility decision can be made without incurring the cost of a download?)
  • avoid consuming sdists to the extent possible (this is probably specific to my org, though, and probably belongs in a separate config item)

@charliermarsh
Copy link
Member

FWIW, "next version" would be fairly easy but "next distribution" would be challenging (purely based on implementation details).

@paveldikov
Copy link
Contributor Author

paveldikov commented Jul 29, 2024

Yes, I think 'next version' should do it, even if it is a somewhat presumptious heuristic. Whilst it is still possible for it to yield false negatives, it is still fewer false negatives than the current approach.

On a pragmatic level: assuming that the 'first' distribution to be attempted is the most optimal one (closest platform/ABI tag match), then it's really 'best wheel or bust' as far as most users are concerned. The value to be gained from attempting a sub-optimal distribution (even sdist) is probably diminishing returns.

And if it doesn't go far enough (which I doubt, but is a possibility) we could track this as a separate issue?

@kirici
Copy link

kirici commented Aug 29, 2024

Just to give an example for the question by notatallshaw

And do you know the mirror software your organization is using?

I've run into it with Sonatype Nexus in combination with Repository Firewall. The symptoms are just as paveldikov reported - the packages are visible and exist, but access is blocked when it comes to pulling the wheels

build logs may only show a 403 error message for quarantined components

https://help.sonatype.com/en/firewall-quarantine.html#firewall-quarantine

@charliermarsh
Copy link
Member

@paveldikov -- I want to help you with this, since I get the sense it's causing a lot of trouble. In your index, is this happening when we go to fetch the metadata for the distribution?

@charliermarsh
Copy link
Member

When we resolve, we have to get the metadata from the wheel. If the registry includes .whl.metadata files, we use those; otherwise, we try to use range requests, and if those too aren't supported, we download the wheel. I assume this is happening when we try to do one of those three things. Do you know which?

@charliermarsh charliermarsh self-assigned this Feb 16, 2025
@paveldikov
Copy link
Contributor Author

I ran with debug just now and it appears that it's fetching the whole wheel.

I see a warning message about 'range requests not supported for ...; streaming wheel'

The registry doesn't appear to include .whl.metadata files. (The log doesn't have anything particularly telling on that part, so I can't tell more about its reasoning)

@charliermarsh
Copy link
Member

Okay great, thanks. That should be enough information for me to go on. Are you able to help test if I put up a PR?

@paveldikov
Copy link
Contributor Author

Would have a hard time retrieving an arbitrary binary. Is a PyPI pre-release feasible?

@charliermarsh
Copy link
Member

Hmm, unfortunately that's not a common practice for us so it would take some work. Maybe I can just ship it and we test against the released binary.

@charliermarsh
Copy link
Member

Okay, I have something up here: #12255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-decision Undecided if this should be done
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants