Papers
arxiv:2501.05453

An Empirical Study of Autoregressive Pre-training from Videos

Published on Jan 9
· Submitted by brjathu on Jan 10
#2 Paper of the day
Authors:
,
,

Abstract

We empirically study autoregressive pre-training from videos. To perform our study, we construct a series of autoregressive video models, called Toto. We treat videos as sequences of visual tokens and train transformer models to autoregressively predict future tokens. Our models are pre-trained on a diverse dataset of videos and images comprising over 1 trillion visual tokens. We explore different architectural, training, and inference design choices. We evaluate the learned visual representations on a range of downstream tasks including image recognition, video classification, object tracking, and robotics. Our results demonstrate that, despite minimal inductive biases, autoregressive pre-training leads to competitive performance across all benchmarks. Finally, we find that scaling our video models results in similar scaling curves to those seen in language models, albeit with a different rate. More details at https://brjathu.github.io/toto/

Community

Paper author Paper submitter

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

IMG_4402.png

Wishing you a Happy New Year ✨️ filled with peace, happiness, and countless. I want to take a moment to thank you for your love and support throughout the year. May God continue to bless you. Kindly send me a direct message or Friend request

Wishing you a Happy New Year ✨️ filled with peace, happiness, and countless. I want to take a moment to thank you for your love and support throughout the year. May God continue to bless you. Kindly send me a direct message or Friend request

IMG_4402.png

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.05453 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.05453 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.05453 in a Space README.md to link it from this page.

Collections including this paper 4