Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ def __call__(self, filename):
"approximate_mode.py",
"sampling.py",
"parallel_decoding.py",
"performance_tips.py",
"custom_frame_mappings.py",
]
else:
Expand Down
170 changes: 170 additions & 0 deletions examples/decoding/performance_tips.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

"""
====================================
Performance Tips and Best Practices
====================================

This tutorial consolidates performance optimization techniques for video
decoding with TorchCodec. Learn when and how to apply various strategies
to increase performance.
"""


# %%
# Overview
# --------
#
# When decoding videos with TorchCodec, several techniques can significantly
# improve performance depending on your use case. This guide covers:
#
# 1. **Batch APIs** - Decode multiple frames at once
# 2. **Approximate Mode & Keyframe Mappings** - Trade accuracy for speed
# 3. **Multi-threading** - Parallelize decoding across videos or chunks
# 4. **CUDA Acceleration** - Use GPU decoding for supported formats
#
# We'll explore each technique and when to use it.

# %%
# 1. Use Batch APIs When Possible
# --------------------------------
#
# If you need to decode multiple frames at once, the batch methods are faster than calling single-frame decoding methods multiple times.
# For example, :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` is faster than calling :meth:`~torchcodec.decoders.VideoDecoder.get_frame_at` multiple times.
# TorchCodec's batch APIs reduce overhead and can leverage internal optimizations.
#
# **Key Methods:**
#
# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` for specific indices
# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_in_range` for ranges
# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_at` for timestamps
# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_in_range` for time ranges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be more clear to group the similar functions here?
For example, we could add two small headers to group index based vs timestamp based retrieval:

For index based frame retrieval:

  • get_frames_at
  • get_frames_in_range

For timestamp based frame retrieval:
...

#
# **When to use:**
#
# - Decoding multiple frames

# %%
# .. note::
#
# For complete examples with runnable code demonstrating batch decoding,
# iteration, and frame retrieval, see:
#
# - :ref:`sphx_glr_generated_examples_decoding_basic_example.py`

# %%
# 2. Approximate Mode & Keyframe Mappings
# ----------------------------------------
#
# By default, TorchCodec uses ``seek_mode="exact"``, which performs a :term:`scan` when
# the decoder is created to build an accurate internal index of frames. This
# ensures frame-accurate seeking but takes longer for decoder initialization,
# especially on long videos.

# %%
# **Approximate Mode**
# ~~~~~~~~~~~~~~~~~~~~
#
# Setting ``seek_mode="approximate"`` skips the initial :term:`scan` and relies on the
# video file's metadata headers. This dramatically speeds up
# :class:`~torchcodec.decoders.VideoDecoder` creation, particularly for long
# videos, but may result in slightly less accurate seeking in some cases.
#
#
# **Which mode should you use:**
#
# - If you care about exactness of frame seeking, use “exact”.
# - If the video is long and you're only decoding a small amount of frames, approximate mode should be faster.

# %%
# **Custom Frame Mappings**
# ~~~~~~~~~~~~~~~~~~~~~~~~~
#
# For advanced use cases, you can pre-compute a custom mapping between desired
# frame indices and actual keyframe locations. This allows you to speed up :class:`~torchcodec.decoders.VideoDecoder`
# instantiation while maintaining the frame seeking accuracy of ``seek_mode="exact"``
#
# **When to use:**
#
# - Frame accuracy is critical, so approximate mode cannot be used
# - Videos can be preprocessed once and then decoded many times
#
# **Performance impact:** Enables consistent, predictable performance for repeated
# random access without the overhead of exact mode's scanning.

# %%
# .. note::
#
# For complete benchmarks showing actual speedup numbers, accuracy comparisons,
# and implementation examples, see:
#
# - :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py`
#
# - :ref:`sphx_glr_generated_examples_decoding_custom_frame_mappings.py`

# %%
# 3. Multi-threading for Parallel Decoding
# -----------------------------------------
#
# When decoding multiple videos or decoding a large number of frames from a single video, there are a few parallelization strategies to speed up the decoding process:
#
# - **FFmpeg-based parallelism** - Using FFmpeg's internal threading capabilities for intra-frame parallelism, where parallelization happens within individual frames rather than across frames
# - **Multiprocessing** - Distributing work across multiple processes
# - **Multithreading** - Using multiple threads within a single process
#
# Both multiprocessing and multithreading can be used to decode multiple videos in parallel, or to decode a single long video in parallel by splitting it into chunks.

# %%
# .. note::
#
# For complete examples comparing
# sequential, ffmpeg-based parallelism, multi-process, and multi-threaded approaches, see:
#
# - :ref:`sphx_glr_generated_examples_decoding_parallel_decoding.py`

# %%
# 4. CUDA Acceleration
# --------------------
#
# TorchCodec supports GPU-accelerated decoding using NVIDIA's hardware decoder
# (NVDEC) on supported hardware. This keeps decoded tensors in GPU memory,
# avoiding expensive CPU-GPU transfers for downstream GPU operations.
#
# **When to use:**
#
# - Decoding large resolution videos
# - Large batch of videos saturating the CPU
# - GPU-intensive pipelines with transforms like scaling and cropping
# - CPU is saturated and you want to free it up for other work
#
# **When NOT to use:**
#
# - You need bit-exact results
# - Small resolution videos and the PCI-e transfer latency is large
# - GPU is already busy and CPU is idle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: this is a personal writing style preference, but within a section lets consistently use either active or passive voice. For example, we could remove "you" from the first bullet point, and instead use the passive voice: "bit-exact results are needed"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consistently use active voice. :)

#
# **Performance impact:** CUDA decoding can significantly outperform CPU decoding,
# especially for high-resolution videos and when combined with GPU-based transforms.
# Actual speedup varies by hardware, resolution, and codec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have those bullet points here. They overlap with what is already in the CUDA decoding tutorial, and I think we'll want to remove them from there and have them here instead.

Eventually we'll also want to update the CUDA tutorial to explain to users how to check whether they're falling back to the CPU.

Mainly here in this tutorial, I think we should insist on one thing (as the main point): users should be using the Beta interface with

with set_cuda_backend("beta"):
    dec = VideoDecoder("file.mp4", device="cuda")


# %%
# **Recommended Usage for Beta Interface**
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# .. code-block:: python
#
# with set_cuda_backend("beta"):
# decoder = VideoDecoder("file.mp4", device="cuda")
#

# %%
# .. note::
#
# For installation instructions, detailed examples, and visual comparisons
# between CPU and CUDA decoding, see:
#
# - :ref:`sphx_glr_generated_examples_decoding_basic_cuda_example.py`
Loading