Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils/archive: support for detecting archive files without proper extensions #5997

Open
clebergnu opened this issue Aug 4, 2024 · 1 comment · May be fixed by #6132
Open

utils/archive: support for detecting archive files without proper extensions #5997

clebergnu opened this issue Aug 4, 2024 · 1 comment · May be fixed by #6132
Assignees
Labels
customer:QEMU Requirements/issues raised by the QEMU project enhancement

Comments

@clebergnu
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Currently, support for opening archive files is based on a table that maps extensions to some metadata like them either them being a "zip" or a "tar" file. This does not take into consideration the other types of archive supported, and also fails to support files without extensions.

Describe the solution you'd like
It should be possible to open an archive file such as:

$ tar cf /tmp/tarfile /etc/issue

And while is_archive() behaves as expected:

>>> archive.is_archive('/tmp/tarfile')
True

The more important extract() does not:

archive.extract('/tmp/tarfile', '/tmp')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cleber/src/avocado/avocado/avocado/utils/archive.py", line 366, in uncompress
    with ArchiveFile.open(filename) as x:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cleber/src/avocado/avocado/avocado/utils/archive.py", line 225, in open
    return cls(filename, mode)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/cleber/src/avocado/avocado/avocado/utils/archive.py", line 205, in __init__
    raise ArchiveException("file is not an archive")
avocado.utils.archive.ArchiveException: file is not an archive

Describe alternatives you've considered
Requiring users to properly name files, but this is sometimes not possible or just an annoyance. Also the extension table is really not mapping well to the supported archive formats.

@clebergnu clebergnu added enhancement customer:QEMU Requirements/issues raised by the QEMU project labels Aug 4, 2024
@mr-avocado mr-avocado bot moved this to Triage in Default project Aug 4, 2024
@clebergnu clebergnu moved this from Triage to Short Term (Current Q) Backlog in Default project Aug 5, 2024
harvey0100 added a commit to harvey0100/avocado that referenced this issue Mar 3, 2025
This patch enhances the archive module to detect and extract archive files
without proper extensions. Previously, while is_archive() could correctly
identify archive files by examining their content, the ArchiveFile class
(used by extract()) was relying solely on file extensions.

The implementation now:

Adds content-based detection to ArchiveFile for tar, zip, and compressed archives
Adds a new is_bzip2_file() function to detect bzip2 files
Improves error handling in the uncompress function
Adds comprehensive unit tests for archives with and without extensions
This fixes the issue where extract() would fail with "file is not an
archive" error when trying to extract an archive without a proper
extension, even though is_archive() correctly identified it.

Reference: avocado-framework#5997
Signed-off-by: Harvey Lynden [email protected]
harvey0100 added a commit to harvey0100/avocado that referenced this issue Mar 3, 2025
This patch enhances the archive module to detect and extract archive files
without proper extensions. Previously, while is_archive() could correctly
identify archive files by examining their content, the ArchiveFile class
(used by extract()) was relying solely on file extensions.

The implementation now:

Adds content-based detection to ArchiveFile for tar, zip, and compressed archives
Adds a new is_bzip2_file() function to detect bzip2 files
Improves error handling in the uncompress function
Adds comprehensive unit tests for archives with and without extensions
This fixes the issue where extract() would fail with "file is not an
archive" error when trying to extract an archive without a proper
extension, even though is_archive() correctly identified it.

Reference: avocado-framework#5997
Signed-off-by: Harvey Lynden [email protected]
harvey0100 added a commit to harvey0100/avocado that referenced this issue Mar 3, 2025
This patch enhances the archive module to detect and extract archive files
without proper extensions. Previously, while is_archive() could correctly
identify archive files by examining their content, the ArchiveFile class
(used by extract()) was relying solely on file extensions.

The implementation now:

Adds content-based detection to ArchiveFile for tar, zip, and compressed archives
Adds a new is_bzip2_file() function to detect bzip2 files
Improves error handling in the uncompress function
Adds comprehensive unit tests for archives with and without extensions
This fixes the issue where extract() would fail with "file is not an
archive" error when trying to extract an archive without a proper
extension, even though is_archive() correctly identified it.

Reference: avocado-framework#5997
Signed-off-by: Harvey Lynden <[email protected]>
@harvey0100 harvey0100 self-assigned this Mar 3, 2025
harvey0100 added a commit to harvey0100/avocado that referenced this issue Mar 3, 2025
This patch enhances the archive module to detect and extract archive files
without proper extensions. Previously, while is_archive() could correctly
identify archive files by examining their content, the ArchiveFile class
(used by extract()) was relying solely on file extensions.

The implementation now:

Adds content-based detection to ArchiveFile for tar, zip, and compressed archives
Adds a new is_bzip2_file() function to detect bzip2 files
Improves error handling in the uncompress function
Adds comprehensive unit tests for archives with and without extensions
This fixes the issue where extract() would fail with "file is not an
archive" error when trying to extract an archive without a proper
extension, even though is_archive() correctly identified it.

Reference: avocado-framework#5997
Signed-off-by: Harvey Lynden <[email protected]>
harvey0100 added a commit to harvey0100/avocado that referenced this issue Mar 3, 2025
This patch enhances the archive module to detect and extract archive files
without proper extensions. Previously, while is_archive() could correctly
identify archive files by examining their content, the ArchiveFile class
(used by extract()) was relying solely on file extensions.

The implementation now:

Adds content-based detection to ArchiveFile for tar, zip, and compressed archives
Adds a new is_bzip2_file() function to detect bzip2 files
Improves error handling in the uncompress function
Adds comprehensive unit tests for archives with and without extensions
This fixes the issue where extract() would fail with "file is not an
archive" error when trying to extract an archive without a proper
extension, even though is_archive() correctly identified it.

Reference: avocado-framework#5997
Signed-off-by: Harvey Lynden <[email protected]>
@harvey0100
Copy link
Contributor

Pull Request: #6132

@richtja richtja moved this from Short Term (Current Q) Backlog to In progress in Default project Mar 3, 2025
@richtja richtja added this to the 110 - Codename TBD milestone Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer:QEMU Requirements/issues raised by the QEMU project enhancement
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

3 participants