Skip to content

unzip and zipfile disagree on CRC validity for a wheel #132526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
inducer opened this issue Apr 14, 2025 · 6 comments
Closed

unzip and zipfile disagree on CRC validity for a wheel #132526

inducer opened this issue Apr 14, 2025 · 6 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@inducer
Copy link

inducer commented Apr 14, 2025

Bug report

Bug description:

This concerns the file at:

https://files.pythonhosted.org/packages/42/a7/bd659e33e10c62b4acabaa1d5da2efa496434a021f8792ab1f23f6fb5514/islpy-2025.1.3-cp313-cp313-macosx_11_0_arm64.whl

(sha256sum: d4821572531e1035727200c0fb8adabeea1d127ead69d75a3e5f5677f5510e7c)

If I download it manually, unzip seems to like the data OK:

$ unzip -t ~/Downloads/islpy-2025.1.3-cp313-cp313-macosx_11_0_arm64.whl
Archive:  /home/andreas/Downloads/islpy-2025.1.3-cp313-cp313-macosx_11_0_arm64.whl
    testing: islpy/                   OK
    testing: islpy-2025.1.3.dist-info/   OK
    testing: islpy/version.py         OK
    testing: islpy/_isl.cpython-313-darwin.so   OK
    testing: islpy/__init__.py        OK
    testing: islpy-2025.1.3.dist-info/RECORD   OK
    testing: islpy-2025.1.3.dist-info/WHEEL   OK
    testing: islpy-2025.1.3.dist-info/top_level.txt   OK
    testing: islpy-2025.1.3.dist-info/METADATA   OK
No errors detected in compressed data of /home/andreas/Downloads/islpy-2025.1.3-cp313-cp313-macosx_11_0_arm64.whl.

But zipfile does not seem to like the file:

$ python3
Python 3.13.2 (main, Mar 29 2025, 10:04:43) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import zipfile
>>> z = zipfile.ZipFile("/home/andreas/Downloads/islpy-2025.1.3-cp313-cp313-macosx_11_0_arm64.whl")
>>> z.extractall()
Traceback (most recent call last):
  File "<python-input-4>", line 1, in <module>
    z.extractall()
    ~~~~~~~~~~~~^^
  File "/usr/lib/python3.13/zipfile/__init__.py", line 1780, in extractall
    self._extract_member(zipinfo, path, pwd)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/zipfile/__init__.py", line 1842, in _extract_member
    shutil.copyfileobj(source, target)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/shutil.py", line 203, in copyfileobj
    while buf := fsrc_read(length):
                 ~~~~~~~~~^^^^^^^^
  File "/usr/lib/python3.13/zipfile/__init__.py", line 1015, in read
    data = self._read1(n)
  File "/usr/lib/python3.13/zipfile/__init__.py", line 1105, in _read1
    self._update_crc(data)
    ~~~~~~~~~~~~~~~~^^^^^^
  File "/usr/lib/python3.13/zipfile/__init__.py", line 1033, in _update_crc
    raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 'islpy/_isl.cpython-313-darwin.so'

Coincidentally, a pip install that tries to use this file will also fail. uv pip install will succeed; I am guessing this uses a different (Rust?) implementation of zip's CRC checking?

x-ref: inducer/islpy#162

CPython versions tested on:

3.13

Operating systems tested on:

Linux

@inducer inducer added the type-bug An unexpected behavior, bug, or error label Apr 14, 2025
@StanFromIreland
Copy link
Contributor

StanFromIreland commented Apr 14, 2025

I’ll look into this later. BTW gh uses backticks for code in titles, it displays wrong on some pages.

@inducer

This comment has been minimized.

@StanFromIreland

This comment has been minimized.

@danifus
Copy link
Contributor

danifus commented Apr 15, 2025

I think the file may be corrupted. 7zip also says islpy/_isl.cpython-313-darwin.so is bad.

There are a few different differences between the local header and central directory for islpy/_isl.cpython-313-darwin.so:
Local header:
CRC: 0x57277a8c
compressed size: 2224673

Central directory:
CRC: 0x7b2e86b0
compressed size: 2561623

Looks like unzip -t uses the local header which passes the check using that CRC and compressed size. Python's zipfile uses the central directory and fails for the CRC and compressed size listed there (the running CRC when the exception is raised is different again from the expected CRC listed in the central directory, probably due to an incorrect compress size)

The rest of the files in that zip look fine.

@picnixz picnixz added the stdlib Python modules in the Lib dir label Apr 15, 2025
@inducer
Copy link
Author

inducer commented Apr 15, 2025

Thanks for the detailed investigation. Based on what you found, I would not fault Python for rejecting the file. IMO, it could even reject the file just based on disagreement between local header and central directory.

I also wonder how the file got to be corrupted in the first place. It is the result of a fairly standard Github workflow that uses https://github.com/pypa/cibuildwheel/.

@inducer
Copy link
Author

inducer commented Apr 15, 2025

I also wonder how the file got to be corrupted in the first place. It is the result of a fairly standard Github workflow that uses https://github.com/pypa/cibuildwheel/.

I've identified a likely culprit:

Sorry about the noise everyone, and thanks for helping get me pointed in the right direction!

@inducer inducer closed this as completed Apr 15, 2025
@ericvsmith ericvsmith closed this as not planned Won't fix, can't repro, duplicate, stale Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

5 participants