-
Notifications
You must be signed in to change notification settings - Fork 70
Description
Right now, the C++ SingleStreamDecoder class has two constructors:
torchcodec/src/torchcodec/_core/SingleStreamDecoder.h
Lines 34 to 45 in 1ea235a
| // Creates a SingleStreamDecoder from the video at videoFilePath. | |
| explicit SingleStreamDecoder( | |
| const std::string& videoFilePath, | |
| SeekMode seekMode = SeekMode::exact); | |
| // Creates a SingleStreamDecoder using the provided AVIOContext inside the | |
| // AVIOContextHolder. The AVIOContextHolder is the base class, and the | |
| // derived class will have specialized how the custom read, seek and writes | |
| // work. | |
| explicit SingleStreamDecoder( | |
| std::unique_ptr<AVIOContextHolder> context, | |
| SeekMode seekMode = SeekMode::exact); |
And separate from there, there is a public API for adding streams of different media types:
torchcodec/src/torchcodec/_core/SingleStreamDecoder.h
Lines 90 to 97 in 1ea235a
| void addVideoStream( | |
| int streamIndex, | |
| std::vector<Transform*>& transforms, | |
| const VideoStreamOptions& videoStreamOptions = VideoStreamOptions(), | |
| std::optional<FrameMappings> customFrameMappings = std::nullopt); | |
| void addAudioStream( | |
| int streamIndex, | |
| const AudioStreamOptions& audioStreamOptions = AudioStreamOptions()); |
Note that there is also a private member function that both of those call:
torchcodec/src/torchcodec/_core/SingleStreamDecoder.h
Lines 304 to 309 in 1ea235a
| void addStream( | |
| int streamIndex, | |
| AVMediaType mediaType, | |
| const torch::Device& device = torch::kCPU, | |
| const std::string_view deviceVariant = "ffmpeg", | |
| std::optional<int> ffmpegThreadCount = std::nullopt); |
This separation is a relic of when SingleStreamDecoder was trying to be more than just a single stream decoder. This API does not match the public Python API, and that fact causes some awkwardness, in particular with how we deal with custom frame mappings; see PR #1060 (comment) for more.
Note that one could imagine the current separation enables just getting metadata without adding a decoding stream, but that should be well supported with SeekMode::approximate.