changing_resolution.md 2.7 KB
Newer Older
helloyongyang's avatar
helloyongyang committed
1
# Variable Resolution Inference
helloyongyang's avatar
helloyongyang committed
2
3
4

## Overview

helloyongyang's avatar
helloyongyang committed
5
Variable resolution inference is a technical strategy for optimizing the denoising process. It improves computational efficiency while maintaining generation quality by using different resolutions at different stages of the denoising process. The core idea of this method is to use lower resolution for coarse denoising in the early stages and switch to normal resolution for fine processing in the later stages.
helloyongyang's avatar
helloyongyang committed
6
7
8

## Technical Principles

helloyongyang's avatar
helloyongyang committed
9
### Multi-stage Denoising Strategy
helloyongyang's avatar
helloyongyang committed
10

helloyongyang's avatar
helloyongyang committed
11
12
13
14
Variable resolution inference is based on the following observations:

- **Early-stage denoising**: Mainly handles coarse noise and overall structure, requiring less detailed information
- **Late-stage denoising**: Focuses on detail optimization and high-frequency information recovery, requiring complete resolution information
helloyongyang's avatar
helloyongyang committed
15
16
17

### Resolution Switching Mechanism

helloyongyang's avatar
helloyongyang committed
18
19
1. **Low-resolution stage** (early stage)
   - Downsample the input to a lower resolution (e.g., 0.75x of original size)
helloyongyang's avatar
helloyongyang committed
20
21
22
   - Execute initial denoising steps
   - Quickly remove most noise and establish basic structure

helloyongyang's avatar
helloyongyang committed
23
2. **Normal resolution stage** (late stage)
helloyongyang's avatar
helloyongyang committed
24
25
   - Upsample the denoising result from the first step back to original resolution
   - Continue executing remaining denoising steps
helloyongyang's avatar
helloyongyang committed
26
27
28
   - Restore detailed information and complete fine processing

### U-shaped Resolution Strategy
helloyongyang's avatar
helloyongyang committed
29

helloyongyang's avatar
helloyongyang committed
30
If resolution is reduced at the very beginning of the denoising steps, it may cause significant differences between the final generated video and the video generated through normal inference. Therefore, a U-shaped resolution strategy can be adopted, where the original resolution is maintained for the first few steps, then resolution is reduced for inference.
helloyongyang's avatar
helloyongyang committed
31
32
33

## Usage

helloyongyang's avatar
helloyongyang committed
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
The config files for variable resolution inference are located [here](https://github.com/ModelTC/LightX2V/tree/main/configs/changing_resolution)

You can test variable resolution inference by specifying --config_json to the specific config file.

You can refer to the scripts [here](https://github.com/ModelTC/LightX2V/blob/main/scripts/changing_resolution) to run.

### Example 1:
```
{
    "infer_steps": 50,
    "changing_resolution": true,
    "resolution_rate": [0.75],
    "changing_resolution_steps": [25]
}
```

This means a total of 50 steps, with resolution at 0.75x original resolution from step 0 to 25, and original resolution from step 26 to the final step.
helloyongyang's avatar
helloyongyang committed
51

helloyongyang's avatar
helloyongyang committed
52
53
54
55
56
57
58
59
60
### Example 2:
```
{
    "infer_steps": 50,
    "changing_resolution": true,
    "resolution_rate": [1.0, 0.75],
    "changing_resolution_steps": [10, 35]
}
```
helloyongyang's avatar
helloyongyang committed
61

helloyongyang's avatar
helloyongyang committed
62
This means a total of 50 steps, with original resolution from step 0 to 10, 0.75x original resolution from step 11 to 35, and original resolution from step 36 to the final step.