Skip to content

Conversation

@Dan-Flores
Copy link
Contributor

No description provided.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 19, 2025
Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great tutorial @Dan-Flores , thank you! I left a few minor comments. Let's also edit the docstring of the VideoEncoder methods, and link to the relevant sections here in this tutorial.

I think that since we'll now have to ship these large decoded video frames with our docs, we are potentially increasing the "docs" size by a few dozens of MB. But I think that's OK.


# %%
# First, we'll download a video and decode some frames to tensors.
# These will be the input to the VideoEncoder. For more details on decoding,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to use :class:`~torchcodec.encoders.VideoEncoder` everywhere.

It should link to the docstring page. Right now it doesn't, because you need to add it to https://github.com/meta-pytorch/torchcodec/blob/main/docs/source/api_ref_encoders.rst?plain=1



# Video source: https://www.pexels.com/video/adorable-cats-on-the-lawn-4977395/
# License: CC0. Author: Altaf Shah.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can still use it, but I don't see the license being explicitly CC0.

Suggested change
# License: CC0. Author: Altaf Shah.
# Author: Altaf Shah.

raw_video_bytes = response.content

decoder = VideoDecoder(raw_video_bytes)
frames = decoder[:60] # Get first 60 frames
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use get_frames_in_range instead, it's more efficient, and we want users to use the most efficient decoding methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slicing actually calls get_frames_in_range():

def _getitem_slice(self, key: slice) -> Tensor:
assert isinstance(key, slice)
start, stop, step = key.indices(len(self))
frame_data, *_ = core.get_frames_in_range(
self._decoder,
start=start,
stop=stop,
step=step,
)
return frame_data

# round-trip encode/decode process works as expected:

decoder_verify = VideoDecoder(encoded_frames)
decoded_frames = decoder_verify[:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

capture_output=True,
text=True,
)
print(f"Codec used in {output}: {result.stdout.strip()}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great section above. The only issue is that it pollutes the codespace with libx264_encoded.mp4 and hevc_encoded.mp4. Let's use temporary files instead, with e.g. https://docs.python.org/3/library/tempfile.html

# Codec Selection
# ---------------
#
# The ``codec`` parameter specifies which video codec to use for encoding.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could start this section by indicating that by default, the codec is selected automatically based on the container format, for example "mp4" tends to default to h264 (I think? Please check me on this)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug, made a similar comment to what I did above. :) Doing it here or in the previous section are both great to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an intro here explaining the default behavior, this way all codec related text is under the same header. I added the mp4 -> h264 example, as it is often the case in my experience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works well. My one follow-up suggestion is to connect the sentence about codec selection with the default. Something like, "If you want a codec other than the default, use the codec parameter." Followed by the explanation of what it is.

# %%
# Low quality (high CRF)
low_quality_output = encoder.to_tensor(format="mp4", codec="libx264", crf=50)
play_video(low_quality_output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done, it's really cool to visually see the effect it has on quality!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I came here to say the same thing. :)

# to check available options for your selected codec.
#

import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scotts
Copy link
Contributor

scotts commented Nov 19, 2025

Let's add a reference to this tutorial in index.rst in the "Encoding" section.

#
# :class:`~torchcodec.encoders.VideoEncoder` supports encoding frames into a
# file via the :meth:`~torchcodec.encoders.VideoEncoder.to_file` method, to
# file-like objects via the :meth:`~torchcodec.encoders.VideoEncoder.to_filelike`
Copy link
Contributor

@scotts scotts Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# file-like objects via the :meth:`~torchcodec.encoders.VideoEncoder.to_filelike`
# file-like objects via the :meth:`~torchcodec.encoders.VideoEncoder.to_file_like`


print(f"Re-decoded video: {decoded_frames.shape = }")
print(f"Original frames: {frames.shape = }")

Copy link
Contributor

@scotts scotts Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an excellent place to explain that the format parameter selects the default codec - we can also briefly explain the difference between, say, an mp4 video file and the actual codec used to decode and encode the video streams in that file. If this is well explained in any externall FFmpeg docs, we can link to those as well.

That then sets us up for the next section, as the natural next question a reader may have is, what if I don't want the default codec?

At the end of the "Codec Selection" section, we should give some guidance on when to just use format and when to specify codec as well. Nothing elaborate, just a sentence or two. I think that will go a long way to informing our about the relationship between these two options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions, I added some brief guidance on codec vs format at the end.

I drafted text to explain the difference between container-format and codec, but I am worried it dilutes the "Codec Selection" section with text that is not specific to the API. I would be happy to add a link, but I was not able to find useful FFmpeg docs on this subject.

# control of encoding settings beyond the common parameters.
#
# For example, some potential extra options for the commonly used H.264 codec, ``libx264`` include:
# For example, with , ``libx264``:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the second "For example" is from before adding the first.

extra_options (dict[str, Any], optional): A dictionary of additional
encoder options to pass, e.g. ``{"qp": 5, "tune": "film"}``.
Values will be converted to strings before passing to the encoder.
See :ref:`extra_options` for details.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not edit the docstrings, as I expect they can be useful as a quick reference to valid values. I am open to suggestions on this, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings look good, I left minor suggestions above. I agree it is super valuable to have short descriptions of valid values in there.

@Dan-Flores Dan-Flores marked this pull request as ready for review November 19, 2025 20:22
Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you! Left some minor comments that are easy to address but LGTM

"yuv420p", "yuv444p"). If not specified, uses codec's default format.
See :ref:`pixel_format` for details.
crf (int or float, optional): Constant Rate Factor for encoding quality. Lower values
mean better quality. Valid range depends on the encoder (commonly 0-51).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and for the other docstrings, I think we should rather say

Suggested change
mean better quality. Valid range depends on the encoder (commonly 0-51).
mean better quality. Valid range depends on the encoder (e.g. 0-51 for libx264).

Defaults to None (which will use encoder's default).
See :ref:`crf` for details.
preset (str or int, optional): Encoder option that controls the tradeoff between
encoding speed and compression. Valid values depend on the encoder (commonly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in the other methods. I suggest this because it wasn't immediately obvious to me what "compression" meant in this case.

Suggested change
encoding speed and compression. Valid values depend on the encoder (commonly
encoding speed and compression (output size). Valid values depend on the encoder (commonly

See :ref:`preset` for details.
extra_options (dict[str, Any], optional): A dictionary of additional
encoder options to pass, e.g. ``{"qp": 5, "tune": "film"}``.
Values will be converted to strings before passing to the encoder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this, I think this is an implementation detail.

extra_options (dict[str, Any], optional): A dictionary of additional
encoder options to pass, e.g. ``{"qp": 5, "tune": "film"}``.
Values will be converted to strings before passing to the encoder.
See :ref:`extra_options` for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings look good, I left minor suggestions above. I agree it is super valuable to have short descriptions of valid values in there.

Comment on lines 158 to 161
# %%
# For most cases, you can simply specify the format parameter and let the FFmpeg select the default codec.
# However, specifying the codec parameter is useful to select a particular codec implementation
# (``libx264`` vs ``libx265``) or to have more control over the encoding behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion but I think this last part could be left out. The intro to this section (starting with "By default, the codec ...") is really good and covers all what is necessary IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading it back it is a bit repetitive. Since its the only text after a code block, I'll remove it for consistency.

# For example, with the commonly used H.264 codec, ``libx264``:
#
# - Values range from 0 (lossless) to 51 (worst quality)
# - Values 17 or 18 are conisdered visually lossless, and the default is 23.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# - Values 17 or 18 are conisdered visually lossless, and the default is 23.
# - Values 17 or 18 are considered visually lossless, and the default is 23.

Comment on lines 231 to 234
# For example, with the commonly used H.264 codec, ``libx264`` presets include:
#
# - ``"ultrafast"`` (fastest), ``"fast"``, ``"medium"`` (default), ``"slow"``, ``"veryslow"`` (slowest, best compression).
# - See additional details in the `H.264 Video Encoding Guide <https://trac.ffmpeg.org/wiki/Encode/H.264#a2.Chooseapresetandtune>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a nitpick, but I think we don't need bullet points here.

@Dan-Flores Dan-Flores merged commit 559b27e into meta-pytorch:main Nov 20, 2025
77 of 78 checks passed
@Dan-Flores Dan-Flores deleted the encoding_tutorial branch November 20, 2025 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants