-
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
Paper • 2412.09593 • Published • 18 -
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Paper • 2412.16112 • Published • 21 -
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Paper • 2412.14171 • Published • 24 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 45
Collections
Discover the best community collections!
Collections including paper arxiv:2412.16112
-
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
Paper • 2501.02576 • Published • 6 -
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Paper • 2412.09626 • Published • 20 -
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion
Paper • 2412.13389 • Published • 6 -
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Paper • 2412.16112 • Published • 21
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 91 -
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Paper • 2501.01257 • Published • 45 -
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Paper • 2501.01423 • Published • 34 -
REDUCIO! Generating 1024times1024 Video within 16 Seconds using Extremely Compressed Motion Latents
Paper • 2411.13552 • Published
-
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 88 -
IamCreateAI/Ruyi-Mini-7B
Image-to-Video • Updated • 17.1k • 576 -
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
Paper • 2412.06016 • Published • 20 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 85
-
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Paper • 2412.09619 • Published • 20 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 45 -
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 25