Skip to content

Conversation

@fservida
Copy link
Contributor

Hello @scudette,
Thanks for the feedback on the other PR, the approach to container hash with armoring is interesting, however it is a big change in the structure, meaning that current tools supporting AFF4(-L) would seem to be incompatible with that.

I would propose the below change as a possible alternative, which would keep full compatibility with previous archives.

  • The AFF4 container hash will be defined as the hash (md5, sha1 and sha256) of the "information.turtle" file.
  • MD5, SHA1 and SHA256 are stored in the container in an optional "container.hashes" file, in json format.
  • Given that "information.turtle" contains the hashes of the segments stored, full container verification is achieved in two steps, in a "chained" configuration:
    • Verification of "information.turtle" using the hash in the "container.hashes" file.
    • Verification of the segments in the container content using the hashes in the "information.turtle" file.

By chaining hashes of segments, and hashes of the metadata this way it is possible to fully verify integrity of the container.
Protection against tampering is improved as with this system it would be easy for the investigator to reference in their CoC/lab notebook a single hash returned at the end of the creation of the container. (Whereas now if acquiring a folder of 100 files an investigator would need to either write down all 100 hashes or manually hash the AFF4-L container afterwards)

I've provided a simple PoC adjustment to the aff4.py script.

The only part where I'm not fully sure is that it seems there can be an instance where the information.turtle is split over multiple files? (cf. https://github.com/fservida/pyaff4/blob/faa1361b48616bad8c63c0c13cba1e4e080dee77/pyaff4/data_store.py#L365) and I'm not sure how that is handled as I don't have a reference to test nor can I find more information in the PDFs of the standard as they seem to always refer to a single turtle file.
(Obviously if the container is modified by appending files, the hash shall be invalidated, but I'm thinking to avoid compatibility issues).

Let me know what you think

fservida added 17 commits July 20, 2023 21:03
Updated to PyYaml 5.4 to reflect latest packaged pyaff4 version on pip
…erwriting

Allow creation of containers based on ZIP_STORED
…to 0xFFFFFFFF to force reading from extra field, extra field size is now written correctly (was always 0 before)
Updated PyYaml to avoid build issues on cython 3.0 yaml/pyyaml#601
Removed version pinning for pyyaml
Addedd pybindgen as required for successful build of fastchunking
require latest intervaltree to avoid issues with mutableset in python >=3.10
Use source version of aff4-snappy to be able to build on ARM64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant