Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video Support #8

Open
dhansmair opened this issue Nov 12, 2022 · 0 comments
Open

Video Support #8

dhansmair opened this issue Nov 12, 2022 · 0 comments
Labels
wontfix This will not be worked on

Comments

@dhansmair
Copy link
Owner

In principle, the FlamingoModel expects visual_features input to have shape [b N T v d] where b=batch_size, N=number of media (images/videos) T=number of frames (T=1 for images, T>1 for videos), v=number of visual features (=number of patches) and d=visual feature dimensionality.
So while in principle it should be able to digest videos, the perceiver resampler is based on Lucidrains implementation https://github.com/lucidrains/flamingo-pytorch/blob/10913abbc8b2ceabb2320560d7d9b85fcb85eee3/flamingo_pytorch/flamingo_pytorch.py#L74 and I haven't checked if it works.
Also FlamingoProcessor currently has no implemented functionality to preprocess videos and I won't add it anytime soon.

@dhansmair dhansmair added the wontfix This will not be worked on label Nov 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

1 participant