README.md 2.42 KB
Newer Older
hepj's avatar
hepj committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# FastVideo VideoGenerator Gradio Demo

This is a Gradio-based web interface for generating videos using the FastVideo framework. The demo allows users to create videos from text prompts with various customization options.

## Overview

The demo uses the FastVideo framework to generate videos based on text prompts. It provides a simple web interface built with Gradio that allows users to:

- Enter text prompts to generate videos
- Customize video parameters (dimensions, number of frames, etc.)
- Use negative prompts to guide the generation process
- Set or randomize seeds for reproducibility

---

## Requirements

- Linux-based OS
- Python 3.10
- Cuda 12.4
- FastVideo

## Installation

```bash
pip install fastvideo
```

## Usage

Run the demo with:

```bash
python fastvideo/v1/examples/inference/gradio/gradio_demo.py
```

This will start a web server at `http://0.0.0.0:7860` where you can access the interface.

---

## Model Initialization

```python
args = FastVideoArgs(model_path="FastVideo/FastHunyuan-Diffusers", num_gpus=2)

generator = VideoGenerator.from_pretrained(
    model_path=args.model_path,
    num_gpus=args.num_gpus
)
```

This demo initializes a `VideoGenerator` with the minimum required arguments for inference. Users can seamlessly adjust inference options between generations, including prompts, resolution, video length, or even the number of inference steps, *without ever needing to reload the model*.

## Video Generation

The core functionality is in the `generate_video` function, which:
1. Processes user inputs
2. Uses the FastVideo VideoGenerator from earlier to run inference (`generator.generate_video()`)
3. Returns an output path that Gradio uses to display the generated video

## Gradio Interface

The interface is built with several components:
- A text input for the prompt
- A video display for the result
- Inference options in a collapsible accordion:
  - Height and width sliders
  - Number of frames slider
  - Guidance scale slider
  - Inference steps slider
  - Negative prompt options
  - Seed controls

### Inference Options

- **Height/Width**: Control the resolution of the generated video
- **Number of Frames**: Set how many frames to generate
- **Guidance Scale**: Control how closely the generation follows the prompt
- **Inference Steps**: More steps can improve quality but take longer
- **Negative Prompt**: Specify what you don't want to see in the video
- **Seed**: Control randomness for reproducible results