You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In principle, the FlamingoModel expects visual_features input to have shape [b N T v d] where b=batch_size, N=number of media (images/videos) T=number of frames (T=1 for images, T>1 for videos), v=number of visual features (=number of patches) and d=visual feature dimensionality.
So while in principle it should be able to digest videos, the perceiver resampler is based on Lucidrains implementation https://github.com/lucidrains/flamingo-pytorch/blob/10913abbc8b2ceabb2320560d7d9b85fcb85eee3/flamingo_pytorch/flamingo_pytorch.py#L74 and I haven't checked if it works.
Also FlamingoProcessor currently has no implemented functionality to preprocess videos and I won't add it anytime soon.
The text was updated successfully, but these errors were encountered:
In principle, the FlamingoModel expects visual_features input to have shape [b N T v d] where b=batch_size, N=number of media (images/videos) T=number of frames (T=1 for images, T>1 for videos), v=number of visual features (=number of patches) and d=visual feature dimensionality.
So while in principle it should be able to digest videos, the perceiver resampler is based on Lucidrains implementation https://github.com/lucidrains/flamingo-pytorch/blob/10913abbc8b2ceabb2320560d7d9b85fcb85eee3/flamingo_pytorch/flamingo_pytorch.py#L74 and I haven't checked if it works.
Also FlamingoProcessor currently has no implemented functionality to preprocess videos and I won't add it anytime soon.
The text was updated successfully, but these errors were encountered: