<p>You are viewing the latest developer preview docs. <a href="https://docs.vllm.ai/en/stable/">Click here</a> to view docs for the latest stable release.</p>
This guide explains how to implement a custom diffusion pipeline in FastVideo, leveraging the framework's modular architecture for high-performance video generation.
## Implementation Process Overview
1.**Port Required Modules** - Identify and implement necessary model components
2.**Create Directory Structure** - Set up pipeline files and folders
3.**Implement Pipeline Class** - Build the pipeline using existing or custom stages
4.**Register Your Pipeline** - Make it discoverable by the framework
5.**Configure Your Pipeline** - (Coming soon)
Need help? Join our [Slack community](https://join.slack.com/t/fastvideo/shared_invite/zt-2zf6ru791-sRwI9lPIUJQq1mIeB_yjJg).
## Step 1: Pipeline Modules
### Identifying Required Modules
FastVideo uses the Hugging Face Diffusers format for model organization:
1. Examine the `model_index.json` in the HF model repository:
After SSH'ing into your pod, you'll find the `fastvideo-dev` Conda environment already activated.
To pull in the latest changes from the GitHub repo:
```bash
cd /FastVideo
git pull
```
`If you have a persistent volume and want to keep your code changes, you can move /FastVideo to /workspace/FastVideo, or simply clone the repository there.`
Thank you for your interest in contributing to FastVideo. We want to make the process as smooth for you as possible and this is a guide to help get you started!
Our community is open to everyone and welcomes any contributions no matter how large or small.
# Developer Environment:
Do make sure you have CUDA 12.4 installed and supported. FastVideo currently only support Linux and CUDA GPUs, but we hope to support other platforms in the future.
We recommend using a fresh Python 3.10 Conda environment to develop FastVideo:
Create and activate a Conda environment for FastVideo:
```
conda create -n fastvideo python=3.10 -y
conda activate fastvideo
```
Clone the FastVideo repository and go to the FastVideo directory:
```
git clone https://github.com/hao-ai-lab/FastVideo.git && cd FastVideo
```
Now you can install FastVideo and setup git hooks for running linting. By using `pre-commit`, the linters will run and have to pass before you'll be able to make a commit.
This document outlines FastVideo's architecture for developers interested in framework internals or contributions. It serves as an onboarding guide for new contributors by providing an overview of the most important directories and files within the `fastvideo/v1/` codebase.
## Table of Contents - V1 Directory Structure and Files
FastVideo separates model components from execution logic with these principles:
-**Component Isolation**: Models (encoders, VAEs, transformers) are isolated from execution (pipelines, stages, distributed processing)
-**Modular Design**: Components can be independently replaced
-**Distributed Execution**: Supports various parallelism strategies (Tensor, Sequence)
-**Custom Attention Backends**: Components can support and use different Attention implementations
-**Pipeline Abstraction**: Consistent interface across diffusion models
(design-fastvideo-args)=
## FastVideoArgs
The `FastVideoArgs` class in `fastvideo/v1/fastvideo_args.py` serves as the central configuration system for FastVideo. It contains all parameters needed to control model loading, inference configuration, performance optimization settings, and more.
Key features include:
-**Command-line Interface**: Automatic conversion between CLI arguments and dataclass fields
-**Configuration Groups**: Organized by functional areas (model loading, video params, optimization settings)
-**Context Management**: Global access to current settings via `get_current_fastvideo_args()`
-**Parameter Validation**: Ensures valid combinations of settings
Common configuration areas:
-**Model paths and loading options**: `model_path`, `trust_remote_code`, `revision`
Defined in `fastvideo/v1/pipelines/pipeline_batch_info.py`, `ForwardBatch` encapsulates the data payload passed between pipeline stages. It typically holds:
Defined in `fastvideo/v1/forward_context.py`, `ForwardContext` manages execution-specific state *within* a forward pass, particularly for low-level optimizations. It is accessed via `get_forward_context()`.
-**Attention Metadata**: Configuration for optimized attention kernels (`attn_metadata`)
-**Profiling Data**: Potential hooks for performance metrics collection
This context-based approach enables:
- Dynamic optimization based on execution state (e.g., attention backend selection)
- Step-specific customizations within model components
# During this forward pass, components can access context
# through get_forward_context()
output=model(inputs)
```
(design-executor-and-worker-abstractions)=
## Executor and Worker System
The `fastvideo/v1/worker/` directory contains the distributed execution framework:
### Executor Abstraction
FastVideo implements a flexible execution model for distributed processing:
-**Executor Base Class**: An abstract base class defining the interface for all executors
-**MultiProcExecutor**: Primary implementation that spawns and manages worker processes
-**GPU Workers**: Handle actual model execution on individual GPUs
The MultiProcExecutor implementation:
1. Spawns worker processes for each GPU
2. Establishes communication channels via pipes
3. Coordinates distributed operations across workers
4. Handles graceful startup and shutdown of the process group
Each GPU worker:
1. Initializes the distributed environment
2. Builds the pipeline for the specified model
3. Executes requested operations on its assigned GPU
4. Manages local resources and communicates results back to the executor
This design allows FastVideo to efficiently utilize multiple GPUs while providing a simple, unified interface for model execution.
(design-platforms)=
## Platforms
The `fastvideo/v1/platforms/` directory provides hardware platform abstractions that enable FastVideo to run efficiently on different hardware configurations:
### Platform Abstraction
FastVideo's platform abstraction layer enables:
-**Hardware Detection**: Automatic detection of available hardware
-**Backend Selection**: Appropriate selection of compute kernels
-**Memory Management**: Efficient utilization of hardware-specific memory features
The primary components include:
-**Platform Interface**: Defines the common API for all platform implementations
-**CUDA Platform**: Optimized implementation for NVIDIA GPUs
-**Backend Enum**: Used throughout the codebase for feature selection
generate_main_index (bool): Whether to generate the main examples index.
If False, only category-specific indices will be generated.
"""
# Create empty indices with dynamic paths
main_index_dir=ROOT_DIR/"docs/source/examples"
ifnotmain_index_dir.exists():
main_index_dir.mkdir(parents=True)
# Create the main examples index only if requested
examples_index=None
ifgenerate_main_index:
examples_index=Index(
path=main_index_dir/"examples_index.md",
title="💡 Examples",
description=
"A collection of examples demonstrating usage of FastVideo.\nAll documented examples are autogenerated using <gh-file:docs/source/generate_examples.py> from examples found in <gh-file:examples>.",# noqa: E501
caption="Examples",
maxdepth=2)
# Category indices with dynamic paths based on category names
"Inference examples demonstrate how to use FastVideo in an offline setting, where the model is queried for predictions in batches. We recommend starting with <project:basic.md>.",# noqa: E501
FastVideo currently only supports Linux and NVIDIA CUDA GPUs.
FastVideo has been tested on the following GPUs, but it should work on any GPUs that supports CUDA 12.4+, please create an issue if you discover any issues:
- RTX 4090
- A40
- L40S
- A100
- H100
## Requirements
- OS: Linux
- Python: 3.10-3.12
- CUDA 12.4+
## Installation Options
### Option 1: Quick Install
```bash
pip install fastvideo
```
### Option 2: Installation from Source
We recommend using a Python environment such as Conda.
#### 1. [Optional] Install Miniconda (if not already installed)
We now support NF4 and LLM-INT8 quantized inference using BitsAndBytes for FastHunyuan. With NF4 quantization, inference can be performed on a single RTX 4090 GPU, requiring just 20GB of VRAM.
For improved quality in generated videos, we recommend using a GPU with 80GB of memory to run the BF16 model with the original Hunyuan pipeline. To execute the inference, use the following section:
We provide two examples in the following script to run inference with STA + [TeaCache](https://github.com/ali-vilab/TeaCache) and STA only.
```bash
sh scripts/inference/inference_hunyuan_STA.sh
```
## Video Demos using STA + Teacache
Visit our [demo website](https://fast-video.github.io/) to explore our complete collection of examples. We shorten a single video generation process from 945s to 317s on H100.
Use the following scripts to run inference for StepVideo. When using STA for inference, the generated videos will have dimensions of 204×768×768 (currently, this is the only supported shape).
```bash
sh scripts/inference/inference_stepvideo_STA.sh # Inference stepvideo with STA
sh scripts/inference/inference_stepvideo.sh # Inference original stepvideo
Use the following scripts to run inference for StepVideo. When using STA for inference, the generated videos will have dimensions of 204×768×768 (currently, this is the only supported shape).
```bash
sh scripts/inference/inference_stepvideo_STA.sh # Inference stepvideo with STA
sh scripts/inference/inference_stepvideo.sh # Inference original stepvideo
```
## Inference HunyuanVideo with Sliding Tile Attention
We provide two examples in the following script to run inference with STA + [TeaCache](https://github.com/ali-vilab/TeaCache) and STA only.
```bash
sh scripts/inference/inference_hunyuan_STA.sh
```
## Video Demos using STA + Teacache
Visit our [demo website](https://fast-video.github.io/) to explore our complete collection of examples. We shorten a single video generation process from 945s to 317s on H100.
## Inference FastHunyuan on single RTX4090
We now support NF4 and LLM-INT8 quantized inference using BitsAndBytes for FastHunyuan. With NF4 quantization, inference can be performed on a single RTX 4090 GPU, requiring just 20GB of VRAM.
For improved quality in generated videos, we recommend using a GPU with 80GB of memory to run the BF16 model with the original Hunyuan pipeline. To execute the inference, use the following section: