utils/archive: Add support for detecting archives without extensions #6132

harvey0100 · 2025-03-03T13:24:58Z

This patch enhances the archive module to detect and extract archive files without proper extensions. Previously, while is_archive() could correctly identify archive files by examining their content, the ArchiveFile class (used by extract()) was relying solely on file extensions.

The implementation now:

Adds content-based detection to ArchiveFile for tar, zip, and compressed archives Adds a new is_bzip2_file() function to detect bzip2 files Improves error handling in the uncompress function Adds comprehensive unit tests for archives with and without extensions This fixes the issue where extract() would fail with "file is not an archive" error when trying to extract an archive without a proper extension, even though is_archive() correctly identified it.

Reference: #5997

This patch enhances the archive module to detect and extract archive files without proper extensions. Previously, while is_archive() could correctly identify archive files by examining their content, the ArchiveFile class (used by extract()) was relying solely on file extensions. The implementation now: Adds content-based detection to ArchiveFile for tar, zip, and compressed archives Adds a new is_bzip2_file() function to detect bzip2 files Improves error handling in the uncompress function Adds comprehensive unit tests for archives with and without extensions This fixes the issue where extract() would fail with "file is not an archive" error when trying to extract an archive without a proper extension, even though is_archive() correctly identified it. Reference: avocado-framework#5997 Signed-off-by: Harvey Lynden <[email protected]>

codecov · 2025-03-03T13:44:16Z

Codecov Report

Attention: Patch coverage is 31.25000% with 55 lines in your changes missing coverage. Please review.

Project coverage is 68.73%. Comparing base (65c967d) to head (165b045).

Files with missing lines	Patch %	Lines
avocado/utils/archive.py	31.25%	55 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6132      +/-   ##
==========================================
- Coverage   68.89%   68.73%   -0.16%     
==========================================
  Files         203      203              
  Lines       22019    22087      +68     
==========================================
+ Hits        15170    15182      +12     
- Misses       6849     6905      +56

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

clebergnu

Hi @harvey0100,

Thanks for this! IMO, the extra bzip2 support should come as a separate commit. Also, there are a number of comments that are more on the general code approach and/or refactor opportunities. Let me know what you think of them.

clebergnu · 2025-03-05T13:38:45Z

avocado/selftests/unit/utils/test_archive.py

+        self.tmpdir = tempfile.TemporaryDirectory(prefix="avocado_" + __name__)
+        self.base_dir = self.tmpdir.name
+
+        # Create a simple text file to archive


I don't think creating files and the archives at run time is the best choice, given that:

It consumes extra time/resources

It's error prone, in the sense that a bug in a system tar or sorts, would impact the outcome of a test

There's already the precedent of shipping "golden" archives in selftests/.data

It adds extra dependencies at test run time (like tar itself)

clebergnu · 2025-03-05T13:47:03Z

avocado/selftests/unit/utils/test_archive.py

+            with archive.ArchiveFile.open(zip_file) as arch:
+                self.assertIsNotNone(arch)
+                files = arch._engine.namelist()
+                self.assertTrue(len(files) > 0, "Archive should contain files")


One idea, not necessary for this patch, but anyway: what if we have some supporting utility code that reads a metadata file associated with an archive. For instance, for the existing selftests/.data/avocado.gz, we could have a selftests/.data/avocado.gz.metadata containing:

{"members": [["avocado", "f1d2d2f924e986ac86fdf7b36c94bcdf32beec15"]]}

The point is to lean towards the idea of shipping "golden" archive files that document themselves and in theory could be tested "automatically" to a great extent.

clebergnu · 2025-03-05T14:05:17Z

avocado/utils/archive.py

+        # Check for TAR file
+        elif tarfile.is_tarfile(filename):
+            # Detect compression method for tar files
+            with open(filename, "rb") as f:


Do you think the logic here could be transformed into a data structure or even added to the existing (or a modified version) of the _extension_table dict?

clebergnu · 2025-03-05T14:07:02Z

avocado/utils/archive.py

+    try:
+        with open(path, "rb") as bz2_file:
+            # Check for bzip2 magic bytes (BZh)
+            return bz2_file.read(3) == b"BZh"


In some places we have constants for the magic bytes, in others, functions that actually check for them. It'd be nice to have a common approach.

harvey0100 self-assigned this Mar 3, 2025

harvey0100 added this to the 110 - Codename TBD milestone Mar 3, 2025

harvey0100 added customer:QEMU Requirements/issues raised by the QEMU project enhancement labels Mar 3, 2025

harvey0100 requested review from richtja and clebergnu and removed request for richtja March 3, 2025 13:25

harvey0100 mentioned this pull request Mar 3, 2025

utils/archive: support for detecting archive files without proper extensions #5997

Open

richtja linked an issue Mar 3, 2025 that may be closed by this pull request

utils/archive: support for detecting archive files without proper extensions #5997

Open

clebergnu reviewed Mar 5, 2025

View reviewed changes

clebergnu modified the milestones: 110 - Codename TBD, 111 - Codename TBD Mar 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utils/archive: Add support for detecting archives without extensions #6132

utils/archive: Add support for detecting archives without extensions #6132

harvey0100 commented Mar 3, 2025

codecov bot commented Mar 3, 2025

clebergnu left a comment

clebergnu Mar 5, 2025

clebergnu Mar 5, 2025

clebergnu Mar 5, 2025

clebergnu Mar 5, 2025

utils/archive: Add support for detecting archives without extensions #6132

Are you sure you want to change the base?

utils/archive: Add support for detecting archives without extensions #6132

Conversation

harvey0100 commented Mar 3, 2025

codecov bot commented Mar 3, 2025

Codecov Report

clebergnu left a comment

Choose a reason for hiding this comment

clebergnu Mar 5, 2025

Choose a reason for hiding this comment

clebergnu Mar 5, 2025

Choose a reason for hiding this comment

clebergnu Mar 5, 2025

Choose a reason for hiding this comment

clebergnu Mar 5, 2025

Choose a reason for hiding this comment