https://arxiv.org/abs/2109.14084

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding (Hu Xu, Gargi Ghosh, Po-Yao Huang, Dmytro Okhonko, Armen Aghajanyan, Florian Metze Luke Zettlemoyer Christoph Feichtenhofer)

제목 그대로 videoclip. zero shot으로도 성능이 나오고, finetuning까지 하면 이전 방법들을 깨뜨리는 성능이 나온다는 직관적인 결과입니다.

openai도 multimodal, 특히 multimodal generation에 관심이 많은 것 같던데 앞으로 어떤 진전이 나올지 궁금하네요.

#video_transformer #retrieval #multimodal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

210928 VideoCLIP.md

210928 VideoCLIP.md

Files

210928 VideoCLIP.md

Latest commit

History

210928 VideoCLIP.md

File metadata and controls