Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up JPEG decoding by allowing resize during decode #8986

Open
gyf304 opened this issue Mar 19, 2025 · 2 comments
Open

Speed up JPEG decoding by allowing resize during decode #8986

gyf304 opened this issue Mar 19, 2025 · 2 comments

Comments

@gyf304
Copy link

gyf304 commented Mar 19, 2025

🚀 The feature

Torchvision's read_image currently decodes JPEG images at full resolution. However, both libjpeg and libjpeg-turbo support decoding at lower resolutions (1/2, 1/4, 1/8 of the original size).

Introducing a size_hint parameter would allow users to specify an approximate target size, with torchvision selecting the closest larger available scale factor and downscale the JPEG image during decoding.

Example Usage:

from torchvision.io.image import decode_image
tensor = decode_image("image.jpeg", size_hint=(224, 224))

Motivation, pitch

  • Many ML pipelines process images at fixed sizes (e.g., 224x224 for ImageNet models). Decoding large images only to downscale them later is inefficient.
  • This can improve memory usage as we do not need to hold the full-sized image in the memory.
  • Pillow provides a similar feature via Image.draft, allowing for approximate size-based decoding.

Alternatives

  • Using Pillow for decoding with downscaling, but torchvision’s native decoder is typically faster than decoding using Pillow and then converting to tensor.
  • Decode and then resize, but this is inefficient, see benchmark below.

Additional context

Benchmark

We implemented a proof-of-concept and ran performance tests on decoding a 1920x1080 image into 960x540.
We compared the following:

  • Use existing decode_jpeg and resize after.
  • Patch decode_jpeg to allow libjpeg / libjpeg-turbo downscaling via the size_hint parameters.

Benchmark results (1000 iters):

9.91s call     .../test_jpeg.py::test_torchvision_image_load_with_resize_960_540
4.00s call     .../test_jpeg.py::test_fastjpeg_image_load_with_size_hint_960_540

~2.5X speed up.

I'm happy to contribute a patch if people consider this useful.

@NicolasHug
Copy link
Member

Thank you for the feature requet @gyf304 . I think that eventually since is something we'll want to enable.

The main challenge here isn't to implement the feature, it's to expose it in a way that isn't going to provide users with a massive footgun.

It is very important for the resizing algorithm (bilinear vs bicubic vs nearest neighbor + with or without antialiasing) to be consistent between training and inference time. When it's not, models accuracy regresses in ways that are very difficult to debug. This has caused a lot of confusion for users over time (e.g. back when the default of antialias parameter of torchvision's Resize wasn't consistent between PIL and Tensors).

So, if we're going to expose a resizing mechanism outsize of torchvision's Resize(), e.g. in decode_image(), we'll have to ensure that the new resizing implementation is consistent with what Resize() exposes, and we should make it hard for users to end up with inconsistent resizing parameters.

@gyf304 gyf304 closed this as completed Mar 20, 2025
@gyf304
Copy link
Author

gyf304 commented Mar 20, 2025

@NicolasHug I accidentally fat-fingered and clicked "Comment and Close Issue" - GitHub unfortunately does not allow me to reopen this issue.

I think this concern can be mitigated by:

  1. Understanding and documenting how resize during decode works
  2. Understanding and documenting its intended use
  3. Designing the API to minimize potential issues

1. Understanding How Resize During Decode Works

JPEG resize during decode is performed at the IDCT level, meaning it operates in the frequency domain. The process is somewhat comparable to applying a sinc filter*.

* This isn’t entirely accurate, as JPEG processes 8x8 blocks, whereas a true sinc filter is unbounded.
* A Lanczos filter, previously referred to as Antialias filter in Pillow, can be seen as a truncated approximation of a sinc filter.

2. Understanding Its Intended Use

Since JPEG resize during decode is limited to predefined scaling factors, the final output size may not precisely match the requested size_hint.

For example, calling decode_image("image.jpg", size_hint=(224, 224)) on a JPEG image guarantees a decoded image that is at least (224, 224), if possible. If an exact size is required, users should follow up with Resize((224, 224)).

It's not feasible to expect that:

resize(decode_image("image.jpg", size_hint=(224, 224)), (224, 224))

will always yield the same result as:

resize(decode_image("image.jpg"), (224, 224))

However, the difference should be minimal.

3. Designing the API to Prevent Issues

This feature, as proposed, is opt-in and does not modify how Resize() functions. Additionally, its docstring can include a clear warning about its implications to help users make informed decisions.

@NicolasHug NicolasHug reopened this Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants