parallel.md 1.58 KB
Newer Older
helloyongyang's avatar
helloyongyang committed
1
# Parallel Inference
helloyongyang's avatar
helloyongyang committed
2

helloyongyang's avatar
helloyongyang committed
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
LightX2V supports distributed parallel inference, enabling the use of multiple GPUs for inference. The DiT part supports two parallel attention mechanisms: **Ulysses** and **Ring**, and also supports **VAE parallel inference**. Parallel inference significantly reduces inference time and alleviates the memory overhead of each GPU.

## DiT Parallel Configuration

DiT parallel is controlled by the `parallel_attn_type` parameter and supports two parallel attention mechanisms:

### 1. Ulysses Parallel

**Configuration:**
```json
{
    "parallel_attn_type": "ulysses"
}
```

### 2. Ring Parallel

**Configuration:**
```json
{
    "parallel_attn_type": "ring"
}
```

## VAE Parallel Configuration

VAE parallel is controlled by the `parallel_vae` parameter:

```json
{
    "parallel_vae": true
}
```

**Configuration Description:**
- `parallel_vae: true`: Enable VAE parallel inference (recommended setting)
- `parallel_vae: false`: Disable VAE parallel, use single GPU processing

**Usage Recommendations:**
- In multi-GPU environments, it is recommended to always enable VAE parallel
- VAE parallel can be combined with any attention parallel method (Ulysses/Ring)
- For memory-constrained scenarios, VAE parallel can significantly reduce memory usage

## Usage

The config files for parallel inference are available [here](https://github.com/ModelTC/lightx2v/tree/main/configs/dist_infer)

By specifying --config_json to the specific config file, you can test parallel inference.

Some running scripts are available [here](https://github.com/ModelTC/lightx2v/tree/main/scripts/dist_infer) for use.