-
Notifications
You must be signed in to change notification settings - Fork 70
Refactor order of getting metadata and adding a stream #1060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // This metadata was already set in initializeDecoder() from the | ||
| // AVCodecParameters that are part of the AVStream. But we consider the | ||
| // AVCodecContext to be more authoritative, so we use that for our decoding | ||
| // stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, the AVCodecContext fields were set to those of the AVCodecParameters when we called avcodec_parameters_to_context just above in addStream:
torchcodec/src/torchcodec/_core/SingleStreamDecoder.cpp
Lines 462 to 463 in 22bcf4d
| int retVal = avcodec_parameters_to_context( | |
| streamInfo.codecContext.get(), streamInfo.stream->codecpar); |
I think it's best to remove the lines below and trust that avcodec_parameters_to_context is doing what we expect it to do. Right now, we are setting the streamMetadata in a lot of different places and it makes it harder to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good call! Yup, I'm happy to remove more code. :)
|
oh, I approved but I wonder if the docs failure is real 🤔
|
| // not the constructor because we need to know the stream index. If we | ||
| // can encode the relevant stream indices into custom frame mappings | ||
| // itself, then we can put it in the constructor. | ||
| writeFallbackBasedMetadata(map, streamMetadata, SeekMode::approximate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand these workarounds are needed right now, but I have a really hard time cleanly reasoning about all the comments above.
We might eventually want to revisit the existence of addStream? Maybe we should just have a constructor, just like we do in Python. I think all the "add stream" logic is mainly a relic of when the decoder was potentially thought to be a multi-stream decoder, but it seems like it's hurting us now
We still want to enable existing use-cases of users getting metadata without having to scan, but I'm pretty sure we can support that by passing approximate mode (that's what we do from the public Python APIs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug, you're right, that's actually the cleanest resolution here: there's no value any more in differentiating the constructor from adding a stream. Created #1064 for follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To double check my understanding, the expected behaviour for custom frame mappings is
- like exact mode for the active stream
- like approximate mode for non active streams
The reason we have to make this distinction with these conditions is because we don't know the active stream index during construction (since we addStream separately from the constructor). Once we consolidate addStream into the constructor, we would be able to get rid of all the conditions and just call
writeFallbackBasedMetadata(map, streamMetadata, seekMode);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is correct @mollyxu !
I've thought this was strange for a long time now - on main, in the public
VideoDecoderandAudioDecoder, we add a stream before getting the metadata. This was not the originally intended order, as evidenced by some of the error checking we do:torchcodec/src/torchcodec/decoders/_video_decoder.py
Lines 409 to 414 in 22bcf4d
We should never hit that error condition, as before we call it, we add the stream. And if the video file has no best video stream, the C++ layer would have thrown before we ever had a chance to reach this condition. I feel that it's more natural to do things in the order in this PR: first get the metadata from the file, then add the stream if the metadata is valid.
The reason why I'm doing this now is that this should simplify the decoder-native transforms. We'll want to know a video stream's height and width when pre-processing the transforms before adding a stream. And that means getting that metadata before adding a stream. In the C++ layer, this does mean accessing values in the headers in
initializeDecoder()throughAVCodecParametersthat we didn't before.