-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up JPEG decoding by allowing resize during decode #8986
Comments
Thank you for the feature requet @gyf304 . I think that eventually since is something we'll want to enable. The main challenge here isn't to implement the feature, it's to expose it in a way that isn't going to provide users with a massive footgun. It is very important for the resizing algorithm (bilinear vs bicubic vs nearest neighbor + with or without antialiasing) to be consistent between training and inference time. When it's not, models accuracy regresses in ways that are very difficult to debug. This has caused a lot of confusion for users over time (e.g. back when the default of So, if we're going to expose a resizing mechanism outsize of torchvision's |
@NicolasHug I accidentally fat-fingered and clicked "Comment and Close Issue" - GitHub unfortunately does not allow me to reopen this issue. I think this concern can be mitigated by:
1. Understanding How Resize During Decode WorksJPEG resize during decode is performed at the IDCT level, meaning it operates in the frequency domain. The process is somewhat comparable to applying a sinc filter*. * This isn’t entirely accurate, as JPEG processes 8x8 blocks, whereas a true sinc filter is unbounded. 2. Understanding Its Intended UseSince JPEG resize during decode is limited to predefined scaling factors, the final output size may not precisely match the requested For example, calling It's not feasible to expect that: resize(decode_image("image.jpg", size_hint=(224, 224)), (224, 224)) will always yield the same result as: resize(decode_image("image.jpg"), (224, 224)) However, the difference should be minimal. 3. Designing the API to Prevent IssuesThis feature, as proposed, is opt-in and does not modify how |
🚀 The feature
Torchvision's
read_image
currently decodes JPEG images at full resolution. However, bothlibjpeg
andlibjpeg-turbo
support decoding at lower resolutions (1/2, 1/4, 1/8 of the original size).Introducing a
size_hint
parameter would allow users to specify an approximate target size, withtorchvision
selecting the closest larger available scale factor and downscale the JPEG image during decoding.Example Usage:
Motivation, pitch
Image.draft
, allowing for approximate size-based decoding.Alternatives
Additional context
Benchmark
We implemented a proof-of-concept and ran performance tests on decoding a 1920x1080 image into 960x540.
We compared the following:
decode_jpeg
and resize after.decode_jpeg
to allowlibjpeg
/libjpeg-turbo
downscaling via thesize_hint
parameters.Benchmark results (1000 iters):
~2.5X speed up.
I'm happy to contribute a patch if people consider this useful.
The text was updated successfully, but these errors were encountered: