-
Notifications
You must be signed in to change notification settings - Fork 70
Accept float frame_rate in VideoEncoder #1061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accept float frame_rate in VideoEncoder #1061
Conversation
src/torchcodec/_core/Encoder.cpp
Outdated
| avCodecContext_->framerate = {inFrameRate_, 1}; | ||
| // TODO-VideoEncoder: Add and utilize output frame_rate option | ||
| AVRational frameRate = av_d2q(inFrameRate_, INT_MAX); | ||
| avCodecContext_->time_base = av_inv_q(frameRate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions are defined here, and do exactly what we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably OK but do see https://fburl.com/metamate/mcunp6zn, we might want to use a higher-resolution time base (I'm not sure). I added a comment below about checking the int pts and float pts (seconds) against he FFmpeg cli. They might slightly differ from ours, and if they do it probably means we should increase the resolution of our timebase here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing! I updated these lines to use the recommended call order.
In terms of actually using the timebase when setting frame pts, I believe its handled by the call to av_packet_rescale_ts - https://ffmpeg.org/doxygen/5.1/group__lavc__packet.html#gae5c86e4d93f6e7aa62ef2c60763ea67e
src/torchcodec/_core/Encoder.cpp
Outdated
| avCodecContext_->framerate = {inFrameRate_, 1}; | ||
| // TODO-VideoEncoder: Add and utilize output frame_rate option | ||
| AVRational frameRate = av_d2q(inFrameRate_, INT_MAX); | ||
| avCodecContext_->time_base = av_inv_q(frameRate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably OK but do see https://fburl.com/metamate/mcunp6zn, we might want to use a higher-resolution time base (I'm not sure). I added a comment below about checking the int pts and float pts (seconds) against he FFmpeg cli. They might slightly differ from ours, and if they do it probably means we should increase the resolution of our timebase here.
test/test_encoders.py
Outdated
| fields=fields, | ||
| ) | ||
| assert ffmpeg_metadata == encoder_metadata | ||
| # mkv container format handles frame_rate differently, so we skip it here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that mkv does not store frame_rate but estimates it, so these assertions fails.
https://video.stackexchange.com/questions/18403/matroska-formats-framerate-is-not-accurate-with-certain-values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update on this: webm format had the same error which led me to dig deeper, and I was able to resolve both by setting avStream_->r_frame_rate.
| class TestVideoEncoder: | ||
| def decode(self, source=None) -> torch.Tensor: | ||
| return VideoDecoder(source).get_frames_in_range(start=0, stop=60) | ||
| return VideoDecoder(source).get_frames_in_range(start=0, stop=30).data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reduced the frames we decode from 60 -> 30.
CI was failing with No space left on device, so this change alleviates that for now.
See the error at: https://github.com/meta-pytorch/torchcodec/runs/55868185381
NicolasHug
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dan-Flores , this looks good. I left a big-ish comment below but it should be easy to address, I'll approve to unblock
| for key in ffmpeg_frame.keys(): | ||
| assert key in encoder_frame | ||
| assert ffmpeg_frame[key] == encoder_frame[key] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good, but we can improve a bit:
- we don't need key-by-key comparison. We can condense all of the above into just dict comparison, i.e.
assert ffmpeg_frames_info == encoder_frames_info - we should also validate the float values for duration and pts. For now we're only comparing the integer values. But the int values can be equal and still lead to different float values if the timebase is different. So it's safer to also assert the float values. And to be even safer, we could also assert the timebase in the stream info!
- In both
_get_frames_infoand_get_video_metadata, we should validate that we actually get back the fields we are requesting! Otherwise, we might think we're validating all these fields, but if FFmpeg doesn't find them we're not actually validating anything. I realized thatpkt_pts,dts,pkt_durationare often missing (but we may not actually need to validate those).
I did all this locally to check a few things and validate my undertanding, so you don't need to re-do it yourself. Here's a diff with my suggested changes
diff --git a/test/test_encoders.py b/test/test_encoders.py
index 0be163a..8d106f9 100644
--- a/test/test_encoders.py
+++ b/test/test_encoders.py
@@ -603,9 +603,10 @@ class TestVideoEncoder:
if "=" in line:
key, value = line.split("=", 1)
metadata[key] = value
+ assert all(field in metadata for field in fields)
return metadata
- def _get_frames_info(self, file_path):
+ def _get_frames_info(self, file_path, fields):
"""Helper function to get frame info (pts, dts, etc.) using ffprobe."""
result = subprocess.run(
[
@@ -615,7 +616,7 @@ class TestVideoEncoder:
"-select_streams",
"v:0",
"-show_entries",
- "frame=pts,pkt_pts,dts,pkt_duration,duration",
+ f"frame={','.join(fields)}",
"-of",
"json",
str(file_path),
@@ -624,7 +625,9 @@ class TestVideoEncoder:
check=True,
text=True,
)
- return json.loads(result.stdout)["frames"]
+ frames = json.loads(result.stdout)["frames"]
+ assert all(field in frame for field in fields for frame in frames)
+ return frames
@pytest.mark.parametrize("method", ("to_file", "to_tensor", "to_file_like"))
def test_bad_input_parameterized(self, tmp_path, method):
@@ -1047,7 +1050,13 @@ class TestVideoEncoder:
# Check that video metadata is the same
if method == "to_file":
- fields = ["duration", "duration_ts", "r_frame_rate", "nb_frames"]
+ fields = [
+ "duration",
+ "duration_ts",
+ "r_frame_rate",
+ "time_base",
+ "nb_frames",
+ ]
ffmpeg_metadata = self._get_video_metadata(
ffmpeg_encoded_path,
fields=fields,
@@ -1059,16 +1068,17 @@ class TestVideoEncoder:
assert ffmpeg_metadata == encoder_metadata
# Check that frame timestamps and duration are the same
- ffmpeg_frames_info = self._get_frames_info(ffmpeg_encoded_path)
- encoder_frames_info = self._get_frames_info(encoder_output_path)
-
- assert len(ffmpeg_frames_info) == len(encoder_frames_info)
- for ffmpeg_frame, encoder_frame in zip(
- ffmpeg_frames_info, encoder_frames_info
- ):
- for key in ffmpeg_frame.keys():
- assert key in encoder_frame
- assert ffmpeg_frame[key] == encoder_frame[key]
+ fields = ("pts", "pts_time")
+ if format != "flv":
+ fields += ("duration", "duration_time")
+ ffmpeg_frames_info = self._get_frames_info(
+ ffmpeg_encoded_path, fields=fields
+ )
+ encoder_frames_info = self._get_frames_info(
+ encoder_output_path, fields=fields
+ )
+
+ assert ffmpeg_frames_info == encoder_frames_info
def test_to_file_like_custom_file_object(self):
"""Test to_file_like with a custom file-like object that implements write and seek."""You can just copy-paste it into a file and then do git apply the_file. It passes locally, but I know ffmpeg encodes metadata fields differently depending on the versions, you might have do do some minor edits for it to pass on all our CI, let's see. The most important path of the diff is the assertions that we're getting the fields we expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've applied the patch so CI can test across different FFmpeg versions.
we should validate that we actually get back the fields we are requesting! Otherwise, we might think we're validating all these fields, but if FFmpeg doesn't find them we're not actually validating anything
My thinking here was different - if our encoded video and FFmpeg's encoded video have the same metadata for the present fields, and any missing fields are missing from both, it seems likely we encoded the video correctly.
I do see the problem case of all fields being missing, in which case no validation was done.
As noted in #1057,
frame_rateoften is afloat, so this PR updates the VideoEncoder API to reflect that.test_video_encoder_against_ffmpeg_cliis updated to parametrizeframe_rateand compare frame timestamp values, such aspts.