Instructions to use FastVideo/FastWan2.1-T2V-14B-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use FastVideo/FastWan2.1-T2V-14B-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("FastVideo/FastWan2.1-T2V-14B-Diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| datasets: | |
| - FastVideo/Wan-Syn_77x448x832_600k | |
| base_model: | |
| - Wan-AI/Wan2.1-T2V-14B-Diffusers | |
| # FastVideo FastWan2.1-T2V-14B-480P-Diffusers | |
| <p align="center"> | |
| <img src="https://raw.githubusercontent.com/hao-ai-lab/FastVideo/main/assets/logo.jpg" width="200"/> | |
| </p> | |
| <div> | |
| <div align="center"> | |
| <a href="https://github.com/hao-ai-lab/FastVideo" target="_blank">FastVideo Team</a>  | |
| </div> | |
| <div align="center"> | |
| <a href="https://arxiv.org/pdf/2505.13389">Paper</a> | | |
| <a href="https://github.com/hao-ai-lab/FastVideo">Github</a> | |
| </div> | |
| </div> | |
| ## Introduction | |
| This model is jointly finetuned with [DMD](https://arxiv.org/pdf/2405.14867) and [VSA](https://arxiv.org/pdf/2505.13389), based on [Wan-AI/Wan2.1-T2V-1.3B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers). It supports efficient 3-step inference and generates high-quality videos at **61×448×832** resolution. We adopt the [FastVideo 480P Synthetic Wan dataset](https://huggingface.co/datasets/FastVideo/Wan-Syn_77x448x832_600k), consisting of 600k synthetic latents. | |
| --- | |
| ## Model Overview | |
| - 3-step inference is supported and achieves up to **50x speed up** on a single **H100** GPU. | |
| - Supports generating videos with resolution **61×448×832**. | |
| - Finetuning and inference scripts are available in the [FastVideo](https://github.com/hao-ai-lab/FastVideo) repository: | |
| - [Finetuning script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/distill/v1_distill_dmd_wan_VSA.sh) | |
| - [Inference script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/inference/v1_inference_wan_dmd.sh) | |
| - Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**, and also support **Mac** users! | |
| ### Training Infrastructure | |
| Training was conducted on **8 nodes with 64 H200 GPUs** in total, using a `global batch size = 64`. | |
| We enable `gradient checkpointing`, set `HSDP_shard_dim = 8`, `sequence_parallel_size = 4`, and use `learning rate = 1e-5`. | |
| We set **VSA attention sparsity** to 0.9, and training runs for **3000 steps (~52 hours)** | |
| The detailed **training example script** is available [here](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan-Syn-480P/distill_dmd_VSA_t2v_14B_480P.slurm). | |
| If you use FastWan2.1-T2V-14B-480P-Diffusers model for your research, please cite our paper: | |
| ``` | |
| @article{zhang2025vsa, | |
| title={VSA: Faster Video Diffusion with Trainable Sparse Attention}, | |
| author={Zhang, Peiyuan and Huang, Haofeng and Chen, Yongqi and Lin, Will and Liu, Zhengzhong and Stoica, Ion and Xing, Eric and Zhang, Hao}, | |
| journal={arXiv preprint arXiv:2505.13389}, | |
| year={2025} | |
| } | |
| @article{zhang2025fast, | |
| title={Fast video generation with sliding tile attention}, | |
| author={Zhang, Peiyuan and Chen, Yongqi and Su, Runlong and Ding, Hangliang and Stoica, Ion and Liu, Zhengzhong and Zhang, Hao}, | |
| journal={arXiv preprint arXiv:2502.04507}, | |
| year={2025} | |
| } | |
| ``` |