03.quantization.md 1.83 KB
Newer Older
1
2
# Quantization

3
lightx2v supports quantized inference for linear layers in **Dit**, enabling `w8a8-int8` and `w8a8-fp8` matrix multiplication.
4

5
## Generating Quantized Models
6

7
### Automatic Quantization
8

9
10
11
12
13
14
15
lightx2v supports automatic weight quantization during inference. Refer to the [configuration file](https://github.com/ModelTC/lightx2v/tree/main/configs/quantization/wan_i2v_quant_auto.json).
**Key configuration**:
Set `"mm_config": {"mm_type": "W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm", "weight_auto_quant": true}`.
- `mm_type`: Specifies the quantized operator
- `weight_auto_quant: true`: Enables automatic model quantization

## Quantized Inference
16

17
### Offline Quantization
18

19
20
21
22
23
24
25
26
27
lightx2v also supports direct loading of pre-quantized weights. For offline model quantization, refer to the [documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme.md).
Configure the [quantization file](https://github.com/ModelTC/lightx2v/tree/main/configs/quantization/wan_i2v_quant_offline.json):
1. Set `dit_quantized_ckpt` to the converted weight path
2. Set `weight_auto_quant` to `false` in `mm_type`

### Automatic Quantization
```shell
bash scripts/run_wan_i2v_quant_auto.sh
```
28

29
30
31
### Offline Quantization
```shell
bash scripts/run_wan_i2v_quant_offline.sh
32

33
```
34

35
## Launching Quantization Service
36
37


38
After offline quantization, point `--config_json` to the offline quantization JSON file.
39

40
Example modification in `scripts/start_server.sh`:
41
42
43
44
45
46
47
48

```shell
export RUNNING_FLAG=infer

python -m lightx2v.api_server \
--model_cls wan2.1 \
--task t2v \
--model_path $model_path \
49
--config_json ${lightx2v_path}/configs/quantization/wan_i2v_quant_offline.json \
50
51
--port 8000
```
52
53
54
55

## Advanced Quantization Features

Refer to the quantization tool [LLMC documentation](https://github.com/ModelTC/llmc/blob/main/docs/en/source/backend/lightx2v.md) for details.