Skip to content

Conversation

@mollyxu
Copy link
Contributor

@mollyxu mollyxu commented Nov 20, 2025

Consolidate performance tips in docs

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 20, 2025
@meta-pytorch meta-pytorch deleted a comment from meta-codesync bot Nov 20, 2025
Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass, thanks @mollyxu , it looks great!

Comment on lines 36 to 37
# If you need to decode multiple frames at once, it is faster when using the batch methods. TorchCodec's batch APIs reduce overhead and can leverage
# internal optimizations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: here it might be useful to explicitly say that the batch methods are faster than the single-frame decoding methods - e.g. get_frames_at() is faster than calling get_frame_at() multiple times.

# - If you care about exactness of frame seeking, use “exact”.
# - If you can sacrifice exactness of seeking for speed, which is usually the case when doing clip sampling, use “approximate”.
# - If your videos don’t have variable framerate and their metadata is correct, then “approximate” mode is a net win: it will be just as accurate as the “exact” mode while still being significantly faster.
# - If your size is small enough and we’re decoding a lot of frames, there’s a chance exact mode is actually faster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above:

This is a good description. I think we can be more nuanced about when to recommend approximate, e.g. we should try to clearly articulate the last 3 bullet points which are currently slightly overlapping and contradictory (we now know that approximate won't always be "a net win").

That's on me: I need to first have a clear understanding of why approximate mode is sometimes slower, and I'll need to update the approximate mode tutorial with more detailed recommendations.

I won't be able to do that in the next few days, so to unblock yourself I think you can just remove the claims about approximate being strictly superior ( bullet points 2 and 3), and the more generic reco could be something like

If the video is long and you're only decoding a small amount of frames, approximate mode should be faster.

It's not super actionable for users but I hope the dedicated tutorial I'll edit will be more precise.

#
# **Performance impact:** CUDA decoding can significantly outperform CPU decoding,
# especially for high-resolution videos and when combined with GPU-based transforms.
# Actual speedup varies by hardware, resolution, and codec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have those bullet points here. They overlap with what is already in the CUDA decoding tutorial, and I think we'll want to remove them from there and have them here instead.

Eventually we'll also want to update the CUDA tutorial to explain to users how to check whether they're falling back to the CPU.

Mainly here in this tutorial, I think we should insist on one thing (as the main point): users should be using the Beta interface with

with set_cuda_backend("beta"):
    dec = VideoDecoder("file.mp4", device="cuda")

@mollyxu
Copy link
Contributor Author

mollyxu commented Nov 21, 2025

Thanks for the feedback!

# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` for specific indices
# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_in_range` for ranges
# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_at` for timestamps
# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_in_range` for time ranges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be more clear to group the similar functions here?
For example, we could add two small headers to group index based vs timestamp based retrieval:

For index based frame retrieval:

  • get_frames_at
  • get_frames_in_range

For timestamp based frame retrieval:
...

@Dan-Flores
Copy link
Contributor

Let's update docs/source/index.rst so this tutorial appears on the main index.html page (similar to these changes)

#
# - You need bit-exact results
# - Small resolution videos and the PCI-e transfer latency is large
# - GPU is already busy and CPU is idle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: this is a personal writing style preference, but within a section lets consistently use either active or passive voice. For example, we could remove "you" from the first bullet point, and instead use the passive voice: "bit-exact results are needed"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants