Video Support #8

dhansmair · 2022-11-12T11:21:02Z

In principle, the FlamingoModel expects visual_features input to have shape [b N T v d] where b=batch_size, N=number of media (images/videos) T=number of frames (T=1 for images, T>1 for videos), v=number of visual features (=number of patches) and d=visual feature dimensionality.
So while in principle it should be able to digest videos, the perceiver resampler is based on Lucidrains implementation https://github.com/lucidrains/flamingo-pytorch/blob/10913abbc8b2ceabb2320560d7d9b85fcb85eee3/flamingo_pytorch/flamingo_pytorch.py#L74 and I haven't checked if it works.
Also FlamingoProcessor currently has no implemented functionality to preprocess videos and I won't add it anytime soon.

dhansmair added the wontfix This will not be worked on label Nov 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video Support #8

Video Support #8

dhansmair commented Nov 12, 2022

Video Support #8

Video Support #8

Comments

dhansmair commented Nov 12, 2022