-
Notifications
You must be signed in to change notification settings - Fork 70
Add performance tips tutorial #1065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
NicolasHug
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a first pass, thanks @mollyxu , it looks great!
| # If you need to decode multiple frames at once, it is faster when using the batch methods. TorchCodec's batch APIs reduce overhead and can leverage | ||
| # internal optimizations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: here it might be useful to explicitly say that the batch methods are faster than the single-frame decoding methods - e.g. get_frames_at() is faster than calling get_frame_at() multiple times.
| # - If you care about exactness of frame seeking, use “exact”. | ||
| # - If you can sacrifice exactness of seeking for speed, which is usually the case when doing clip sampling, use “approximate”. | ||
| # - If your videos don’t have variable framerate and their metadata is correct, then “approximate” mode is a net win: it will be just as accurate as the “exact” mode while still being significantly faster. | ||
| # - If your size is small enough and we’re decoding a lot of frames, there’s a chance exact mode is actually faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above:
This is a good description. I think we can be more nuanced about when to recommend approximate, e.g. we should try to clearly articulate the last 3 bullet points which are currently slightly overlapping and contradictory (we now know that approximate won't always be "a net win").
That's on me: I need to first have a clear understanding of why approximate mode is sometimes slower, and I'll need to update the approximate mode tutorial with more detailed recommendations.
I won't be able to do that in the next few days, so to unblock yourself I think you can just remove the claims about approximate being strictly superior ( bullet points 2 and 3), and the more generic reco could be something like
If the video is long and you're only decoding a small amount of frames, approximate mode should be faster.
It's not super actionable for users but I hope the dedicated tutorial I'll edit will be more precise.
| # | ||
| # **Performance impact:** CUDA decoding can significantly outperform CPU decoding, | ||
| # especially for high-resolution videos and when combined with GPU-based transforms. | ||
| # Actual speedup varies by hardware, resolution, and codec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to have those bullet points here. They overlap with what is already in the CUDA decoding tutorial, and I think we'll want to remove them from there and have them here instead.
Eventually we'll also want to update the CUDA tutorial to explain to users how to check whether they're falling back to the CPU.
Mainly here in this tutorial, I think we should insist on one thing (as the main point): users should be using the Beta interface with
with set_cuda_backend("beta"):
dec = VideoDecoder("file.mp4", device="cuda")|
Thanks for the feedback! |
| # - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` for specific indices | ||
| # - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_in_range` for ranges | ||
| # - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_at` for timestamps | ||
| # - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_in_range` for time ranges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be more clear to group the similar functions here?
For example, we could add two small headers to group index based vs timestamp based retrieval:
For index based frame retrieval:
get_frames_atget_frames_in_range
For timestamp based frame retrieval:
...
|
Let's update |
| # | ||
| # - You need bit-exact results | ||
| # - Small resolution videos and the PCI-e transfer latency is large | ||
| # - GPU is already busy and CPU is idle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: this is a personal writing style preference, but within a section lets consistently use either active or passive voice. For example, we could remove "you" from the first bullet point, and instead use the passive voice: "bit-exact results are needed"
Consolidate performance tips in docs