SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
Authors: J. Chen, Y. Zhao, J. Yu*, R. Chu, J. Chen, S. Yang, X. Wang, Y. Pan, D. Zhou, H. Ling, H. Liu, H. Yi, H. Zhang, M. Li, Y. Chen, H. Cai, S. Fidler, P. Luo, S. Han, E. Xie
Status: Submitted to ICLR 2026.
Preprint: arXiv: 2509.24695
SANA-Video introduces a block linear diffusion transformer tailored for video generation. By linearizing attention within temporal blocks, it reduces quadratic cost while retaining temporal coherence. The design aims to deliver high-quality, long-horizon samples with significantly better efficiency, making diffusion more practical for video workloads.