Skip to content

Conversation

@Dan-Flores
Copy link
Contributor

@Dan-Flores Dan-Flores commented Nov 18, 2025

As noted in #1057, frame_rate often is a float, so this PR updates the VideoEncoder API to reflect that.

  • test_video_encoder_against_ffmpeg_cli is updated to parametrize frame_rate and compare frame timestamp values, such as pts.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 18, 2025
avCodecContext_->framerate = {inFrameRate_, 1};
// TODO-VideoEncoder: Add and utilize output frame_rate option
AVRational frameRate = av_d2q(inFrameRate_, INT_MAX);
avCodecContext_->time_base = av_inv_q(frameRate);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are defined here, and do exactly what we need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably OK but do see https://fburl.com/metamate/mcunp6zn, we might want to use a higher-resolution time base (I'm not sure). I added a comment below about checking the int pts and float pts (seconds) against he FFmpeg cli. They might slightly differ from ours, and if they do it probably means we should increase the resolution of our timebase here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing! I updated these lines to use the recommended call order.

In terms of actually using the timebase when setting frame pts, I believe its handled by the call to av_packet_rescale_ts - https://ffmpeg.org/doxygen/5.1/group__lavc__packet.html#gae5c86e4d93f6e7aa62ef2c60763ea67e

@Dan-Flores Dan-Flores marked this pull request as ready for review November 18, 2025 22:29
avCodecContext_->framerate = {inFrameRate_, 1};
// TODO-VideoEncoder: Add and utilize output frame_rate option
AVRational frameRate = av_d2q(inFrameRate_, INT_MAX);
avCodecContext_->time_base = av_inv_q(frameRate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably OK but do see https://fburl.com/metamate/mcunp6zn, we might want to use a higher-resolution time base (I'm not sure). I added a comment below about checking the int pts and float pts (seconds) against he FFmpeg cli. They might slightly differ from ours, and if they do it probably means we should increase the resolution of our timebase here.

fields=fields,
)
assert ffmpeg_metadata == encoder_metadata
# mkv container format handles frame_rate differently, so we skip it here
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that mkv does not store frame_rate but estimates it, so these assertions fails.
https://video.stackexchange.com/questions/18403/matroska-formats-framerate-is-not-accurate-with-certain-values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update on this: webm format had the same error which led me to dig deeper, and I was able to resolve both by setting avStream_->r_frame_rate.

class TestVideoEncoder:
def decode(self, source=None) -> torch.Tensor:
return VideoDecoder(source).get_frames_in_range(start=0, stop=60)
return VideoDecoder(source).get_frames_in_range(start=0, stop=30).data
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reduced the frames we decode from 60 -> 30.
CI was failing with No space left on device, so this change alleviates that for now.
See the error at: https://github.com/meta-pytorch/torchcodec/runs/55868185381

Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dan-Flores , this looks good. I left a big-ish comment below but it should be easy to address, I'll approve to unblock

for key in ffmpeg_frame.keys():
assert key in encoder_frame
assert ffmpeg_frame[key] == encoder_frame[key]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, but we can improve a bit:

  • we don't need key-by-key comparison. We can condense all of the above into just dict comparison, i.e. assert ffmpeg_frames_info == encoder_frames_info
  • we should also validate the float values for duration and pts. For now we're only comparing the integer values. But the int values can be equal and still lead to different float values if the timebase is different. So it's safer to also assert the float values. And to be even safer, we could also assert the timebase in the stream info!
  • In both _get_frames_info and _get_video_metadata, we should validate that we actually get back the fields we are requesting! Otherwise, we might think we're validating all these fields, but if FFmpeg doesn't find them we're not actually validating anything. I realized that pkt_pts,dts,pkt_duration are often missing (but we may not actually need to validate those).

I did all this locally to check a few things and validate my undertanding, so you don't need to re-do it yourself. Here's a diff with my suggested changes

diff --git a/test/test_encoders.py b/test/test_encoders.py
index 0be163a..8d106f9 100644
--- a/test/test_encoders.py
+++ b/test/test_encoders.py
@@ -603,9 +603,10 @@ class TestVideoEncoder:
             if "=" in line:
                 key, value = line.split("=", 1)
                 metadata[key] = value
+        assert all(field in metadata for field in fields)
         return metadata
 
-    def _get_frames_info(self, file_path):
+    def _get_frames_info(self, file_path, fields):
         """Helper function to get frame info (pts, dts, etc.) using ffprobe."""
         result = subprocess.run(
             [
@@ -615,7 +616,7 @@ class TestVideoEncoder:
                 "-select_streams",
                 "v:0",
                 "-show_entries",
-                "frame=pts,pkt_pts,dts,pkt_duration,duration",
+                f"frame={','.join(fields)}",
                 "-of",
                 "json",
                 str(file_path),
@@ -624,7 +625,9 @@ class TestVideoEncoder:
             check=True,
             text=True,
         )
-        return json.loads(result.stdout)["frames"]
+        frames = json.loads(result.stdout)["frames"]
+        assert all(field in frame for field in fields for frame in frames)
+        return frames
 
     @pytest.mark.parametrize("method", ("to_file", "to_tensor", "to_file_like"))
     def test_bad_input_parameterized(self, tmp_path, method):
@@ -1047,7 +1050,13 @@ class TestVideoEncoder:
 
         # Check that video metadata is the same
         if method == "to_file":
-            fields = ["duration", "duration_ts", "r_frame_rate", "nb_frames"]
+            fields = [
+                "duration",
+                "duration_ts",
+                "r_frame_rate",
+                "time_base",
+                "nb_frames",
+            ]
             ffmpeg_metadata = self._get_video_metadata(
                 ffmpeg_encoded_path,
                 fields=fields,
@@ -1059,16 +1068,17 @@ class TestVideoEncoder:
             assert ffmpeg_metadata == encoder_metadata
 
             # Check that frame timestamps and duration are the same
-            ffmpeg_frames_info = self._get_frames_info(ffmpeg_encoded_path)
-            encoder_frames_info = self._get_frames_info(encoder_output_path)
-
-            assert len(ffmpeg_frames_info) == len(encoder_frames_info)
-            for ffmpeg_frame, encoder_frame in zip(
-                ffmpeg_frames_info, encoder_frames_info
-            ):
-                for key in ffmpeg_frame.keys():
-                    assert key in encoder_frame
-                    assert ffmpeg_frame[key] == encoder_frame[key]
+            fields = ("pts", "pts_time")
+            if format != "flv":
+                fields += ("duration", "duration_time")
+            ffmpeg_frames_info = self._get_frames_info(
+                ffmpeg_encoded_path, fields=fields
+            )
+            encoder_frames_info = self._get_frames_info(
+                encoder_output_path, fields=fields
+            )
+
+            assert ffmpeg_frames_info == encoder_frames_info
 
     def test_to_file_like_custom_file_object(self):
         """Test to_file_like with a custom file-like object that implements write and seek."""

You can just copy-paste it into a file and then do git apply the_file. It passes locally, but I know ffmpeg encodes metadata fields differently depending on the versions, you might have do do some minor edits for it to pass on all our CI, let's see. The most important path of the diff is the assertions that we're getting the fields we expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've applied the patch so CI can test across different FFmpeg versions.

we should validate that we actually get back the fields we are requesting! Otherwise, we might think we're validating all these fields, but if FFmpeg doesn't find them we're not actually validating anything

My thinking here was different - if our encoded video and FFmpeg's encoded video have the same metadata for the present fields, and any missing fields are missing from both, it seems likely we encoded the video correctly.

I do see the problem case of all fields being missing, in which case no validation was done.

@Dan-Flores Dan-Flores merged commit cac99ae into meta-pytorch:main Nov 20, 2025
70 checks passed
@Dan-Flores Dan-Flores deleted the float_input_frame_rate branch November 20, 2025 16:06
@Dan-Flores Dan-Flores mentioned this pull request Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants