-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Open
Description
🚀 The feature
Torchvision's read_image currently decodes JPEG images at full resolution. However, both libjpeg and libjpeg-turbo support decoding at lower resolutions (1/2, 1/4, 1/8 of the original size).
Introducing a size_hint parameter would allow users to specify an approximate target size, with torchvision selecting the closest larger available scale factor and downscale the JPEG image during decoding.
Example Usage:
from torchvision.io.image import decode_image
tensor = decode_image("image.jpeg", size_hint=(224, 224))Motivation, pitch
- Many ML pipelines process images at fixed sizes (e.g., 224x224 for ImageNet models). Decoding large images only to downscale them later is inefficient.
- This can improve memory usage as we do not need to hold the full-sized image in the memory.
- Pillow provides a similar feature via
Image.draft, allowing for approximate size-based decoding.
Alternatives
- Using Pillow for decoding with downscaling, but torchvision’s native decoder is typically faster than decoding using Pillow and then converting to tensor.
- Decode and then resize, but this is inefficient, see benchmark below.
Additional context
Benchmark
We implemented a proof-of-concept and ran performance tests on decoding a 1920x1080 image into 960x540.
We compared the following:
- Use existing
decode_jpegand resize after. - Patch
decode_jpegto allowlibjpeg/libjpeg-turbodownscaling via thesize_hintparameters.
Benchmark results (1000 iters):
9.91s call .../test_jpeg.py::test_torchvision_image_load_with_resize_960_540
4.00s call .../test_jpeg.py::test_fastjpeg_image_load_with_size_hint_960_540
~2.5X speed up.
I'm happy to contribute a patch if people consider this useful.
abhi-glitchhg
Metadata
Metadata
Assignees
Labels
No labels