Refactor order of getting metadata and adding a stream #1060

scotts · 2025-11-18T02:27:14Z

I've thought this was strange for a long time now - on main, in the public VideoDecoder and AudioDecoder, we add a stream before getting the metadata. This was not the originally intended order, as evidenced by some of the error checking we do:

torchcodec/src/torchcodec/decoders/_video_decoder.py

Lines 409 to 414 in 22bcf4d

    
           if stream_index is None: 
        
               if (stream_index := container_metadata.best_video_stream_index) is None: 
        
                   raise ValueError( 
        
                       "The best video stream is unknown and there is no specified stream. " 
        
                       + ERROR_REPORTING_INSTRUCTIONS 
        
                   )

We should never hit that error condition, as before we call it, we add the stream. And if the video file has no best video stream, the C++ layer would have thrown before we ever had a chance to reach this condition. I feel that it's more natural to do things in the order in this PR: first get the metadata from the file, then add the stream if the metadata is valid.

The reason why I'm doing this now is that this should simplify the decoder-native transforms. We'll want to know a video stream's height and width when pre-processing the transforms before adding a stream. And that means getting that metadata before adding a stream. In the C++ layer, this does mean accessing values in the headers in initializeDecoder() through AVCodecParameters that we didn't before.

NicolasHug · 2025-11-18T10:30:17Z

src/torchcodec/_core/SingleStreamDecoder.cpp

+  // This metadata was already set in initializeDecoder() from the
+  // AVCodecParameters that are part of the AVStream. But we consider the
+  // AVCodecContext to be more authoritative, so we use that for our decoding
+  // stream.


From what I understand, the AVCodecContext fields were set to those of the AVCodecParameters when we called avcodec_parameters_to_context just above in addStream:

torchcodec/src/torchcodec/_core/SingleStreamDecoder.cpp

Lines 462 to 463 in 22bcf4d

int retVal = avcodec_parameters_to_context(

streamInfo.codecContext.get(), streamInfo.stream->codecpar);

I think it's best to remove the lines below and trust that avcodec_parameters_to_context is doing what we expect it to do. Right now, we are setting the streamMetadata in a lot of different places and it makes it harder to reason about.

Oh, good call! Yup, I'm happy to remove more code. :)

src/torchcodec/_core/FFMPEGCommon.h

NicolasHug · 2025-11-18T16:27:07Z

oh, I approved but I wonder if the docs failure is real 🤔

 File "/__w/_temp/conda_environment_19470203429/lib/python3.10/site-packages/torchcodec/decoders/_video_decoder.py", line 423, in _get_and_validate_stream_metadata

    raise ValueError(
ValueError: The minimum pts value in seconds is unknown. 
This should never happen. Please report an issue following the steps in

…tadata_order

NicolasHug · 2025-11-19T10:14:24Z

src/torchcodec/_core/custom_ops.cpp

+    //       not the constructor because we need to know the stream index. If we
+    //       can encode the relevant stream indices into custom frame mappings
+    //       itself, then we can put it in the constructor.
+    writeFallbackBasedMetadata(map, streamMetadata, SeekMode::approximate);


I understand these workarounds are needed right now, but I have a really hard time cleanly reasoning about all the comments above.

We might eventually want to revisit the existence of addStream? Maybe we should just have a constructor, just like we do in Python. I think all the "add stream" logic is mainly a relic of when the decoder was potentially thought to be a multi-stream decoder, but it seems like it's hurting us now

We still want to enable existing use-cases of users getting metadata without having to scan, but I'm pretty sure we can support that by passing approximate mode (that's what we do from the public Python APIs).

@NicolasHug, you're right, that's actually the cleanest resolution here: there's no value any more in differentiating the constructor from adding a stream. Created #1064 for follow-up.

To double check my understanding, the expected behaviour for custom frame mappings is

like exact mode for the active stream

like approximate mode for non active streams

The reason we have to make this distinction with these conditions is because we don't know the active stream index during construction (since we addStream separately from the constructor). Once we consolidate addStream into the constructor, we would be able to get rid of all the conditions and just call
writeFallbackBasedMetadata(map, streamMetadata, seekMode);

Your understanding is correct @mollyxu !

Refactor order of getting metadata and adding a stream

84fd7a5

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 18, 2025

scotts marked this pull request as ready for review November 18, 2025 03:00

NicolasHug reviewed Nov 18, 2025

View reviewed changes

Remove re-setting of metadata

8e0b756

NicolasHug approved these changes Nov 18, 2025

View reviewed changes

src/torchcodec/_core/FFMPEGCommon.h Show resolved Hide resolved

scotts added 2 commits November 18, 2025 10:51

Merge branch 'main' of github.com:pytorch/torchcodec into refactor_me…

7832834

…tadata_order

Deal with custom frame mapping

9b14d17

NicolasHug approved these changes Nov 19, 2025

View reviewed changes

scotts mentioned this pull request Nov 19, 2025

Refactor C++ SingleStreamDecoder to consolidate addStream into constructor #1064

Open

scotts merged commit 04b02b9 into meta-pytorch:main Nov 19, 2025
77 of 78 checks passed

scotts deleted the refactor_metadata_order branch November 19, 2025 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor order of getting metadata and adding a stream #1060

Refactor order of getting metadata and adding a stream #1060

scotts commented Nov 18, 2025 •

edited

Loading

Uh oh!

NicolasHug Nov 18, 2025

Uh oh!

scotts Nov 18, 2025

Uh oh!

Uh oh!

NicolasHug commented Nov 18, 2025

Uh oh!

NicolasHug Nov 19, 2025

Uh oh!

scotts Nov 19, 2025

Uh oh!

mollyxu Nov 19, 2025

Uh oh!

NicolasHug Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if stream_index is None:
	if (stream_index := container_metadata.best_video_stream_index) is None:
	raise ValueError(
	"The best video stream is unknown and there is no specified stream. "
	+ ERROR_REPORTING_INSTRUCTIONS
	)

	int retVal = avcodec_parameters_to_context(
	streamInfo.codecContext.get(), streamInfo.stream->codecpar);

Refactor order of getting metadata and adding a stream #1060

Refactor order of getting metadata and adding a stream #1060

Conversation

scotts commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug commented Nov 18, 2025

Uh oh!

NicolasHug Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

mollyxu Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scotts commented Nov 18, 2025 •

edited

Loading