-
Notifications
You must be signed in to change notification settings - Fork 243
Image Size Filtering for PhotoRec #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
piotrkochan
wants to merge
28
commits into
cgsecurity:master
Choose a base branch
from
piotrkochan:imgsize_filter
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #167
Image Size Filtering for PhotoRec
This PR implements filtering of recovered image files by dimensions and file size, addressing the requirement to skip thumbnail-sized images during recovery. The feature currently supports JPG and PNG formats with memory-efficient buffering.
Problems Addressed
Excessive I/O for small files: PhotoRec's original architecture opens a file handle for every detected file signature, writes data to disk, then evaluates filters post-recovery. For recoveries with thousands of thumbnails (10-50KB JPG/PNG files), this meant:
No dimension-based filtering: There was no option to filter by image dimensions (width, height, resolution) and by image filesize on request.
Solution
Pre-save filtering with memory buffering: To filter images without wasting I/O, PhotoRec needs to know both dimensions AND file size before creating files on disk. But this creates a problem: dimensions are in the image header (first few hundred bytes), while actual file size requires finding the end-of-file marker.
The solution uses memory buffering combined with a new
file_check_presave()
callback:Instead of writing files to disk immediately:
This eliminates wasted disk I/O for rejected images entirely. The
file_check_presave()
callback operates on memory buffer where both dimensions and file size are known, allowing complete filtering decision before any disk writes.Core Changes
New filtering module (
src/image_filter.c
,src/image_filter.h
):307200
) and dimension format (640x480
)800-1920
or-1080
for "no min, max 1080")File format handlers (
src/file_jpg.c
,src/file_png.c
):file_check_presave()
callback that evaluates filters on recovered file data (from memory buffer if buffering is active, or from initial read buffer otherwise)is_image=1
flag in file_hint structures to enable memory buffering for these formatsTo enable image filtering for other formats, modify the file format handler (
file_*.c
) to:is_image=1
in thefile_hint_t
structurefile_check_presave()
callback that:should_skip_image_by_dimensions()
andshould_skip_image_by_filesize()
fromimage_filter.h
header_check_*()
function:file_recovery_new->file_check_presave = &your_presave_callback
file_recovery_new->image_filter = file_recovery->image_filter
See
file_jpg.c:jpg_maches_image_filtering()
andfile_png.c:png_maches_image_filtering()
for reference implementations.Memory buffering (
src/filegen.c
):calloc()
instead ofmalloc()
to avoid immediate physical memory allocationncurses UI (
src/phrecn.c
):CLI interface (
src/phcli.c
,/cmd
batch mode):imagesize,size,MIN-MAX,width,MIN-MAX,height,MIN-MAX,pixels,MIN-MAX
100k
,1.5m
,2g
(kilobytes/megabytes/gigabytes)800-1920
(range),800-
(min only),-1080
(max only)pixels,307200-2073600
(direct pixel values)pixels,640x480-1920x1080
(width×height, auto-multiplied to pixel count)imagesize,size,100k-,width,800-,height,600-
(min 100KB, min 800×600)imagesize,pixels,640x480-
(min 640×480 resolution = 307200 pixels)Session persistence (
src/sessionp.c
):Testing
Python test suite available at https://gist.github.com/piotrkochan/1eb15d8ecb85c866e716bd07ee48d203
The test script automates validation by running PhotoRec against a disk image with various filter configurations, then verifying that recovered files match the specified criteria using ImageMagick's
identify
command. It tests file size filtering with min/max/range values and unit notation (k/m/g), dimension filtering for width and height with various boundary conditions, and resolution filtering in both pixel count and WIDTHxHEIGHT format. Combined filters with multiple parameters active simultaneously are also tested. The script performs automatic baseline analysis using percentile calculations to generate realistic test ranges based on actual recovered content.Future Work
This implementation is designed for extensibility:
image_filter.c
for easy addition of other image formats (GIF, BMP, TIFF, WebP, etc.)