quantization.md 1.84 KB
Newer Older
helloyongyang's avatar
helloyongyang committed
1
# Model Quantization
2

helloyongyang's avatar
helloyongyang committed
3
lightx2v supports quantized inference for linear layers in **Dit**, enabling `w8a8-int8` and `w8a8-fp8` matrix multiplication.
4

helloyongyang's avatar
helloyongyang committed
5
## Generating Quantized Models
6

helloyongyang's avatar
helloyongyang committed
7
### Automatic Quantization
8

helloyongyang's avatar
helloyongyang committed
9
10
11
12
13
lightx2v supports automatic weight quantization during inference. Refer to the [configuration file](https://github.com/ModelTC/lightx2v/tree/main/configs/quantization/wan_i2v_quant_auto.json).
**Key configuration**:
Set `"mm_config": {"mm_type": "W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm", "weight_auto_quant": true}`.
- `mm_type`: Specifies the quantized operator
- `weight_auto_quant: true`: Enables automatic model quantization
14

helloyongyang's avatar
helloyongyang committed
15
### Offline Quantization
16

helloyongyang's avatar
helloyongyang committed
17
18
19
20
lightx2v also supports direct loading of pre-quantized weights. For offline model quantization, refer to the [documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme.md).
Configure the [quantization file](https://github.com/ModelTC/lightx2v/tree/main/configs/quantization/wan_i2v_quant_offline.json):
1. Set `dit_quantized_ckpt` to the converted weight path
2. Set `weight_auto_quant` to `false` in `mm_type`
21
22


helloyongyang's avatar
helloyongyang committed
23
## Quantized Inference
24

helloyongyang's avatar
helloyongyang committed
25
### Automatic Quantization
26
27
28
```shell
bash scripts/run_wan_i2v_quant_auto.sh
```
helloyongyang's avatar
helloyongyang committed
29
30

### Offline Quantization
31
32
```shell
bash scripts/run_wan_i2v_quant_offline.sh
helloyongyang's avatar
helloyongyang committed
33

34
35
```

helloyongyang's avatar
helloyongyang committed
36
37
## Launching Quantization Service

38

helloyongyang's avatar
helloyongyang committed
39
After offline quantization, point `--config_json` to the offline quantization JSON file.
40

helloyongyang's avatar
helloyongyang committed
41
Example modification in `scripts/start_server.sh`:
42
43
44
45
46
47
48
49

```shell
export RUNNING_FLAG=infer

python -m lightx2v.api_server \
--model_cls wan2.1 \
--task t2v \
--model_path $model_path \
50
--config_json ${lightx2v_path}/configs/quantization/wan_i2v_quant_offline.json \
51
52
--port 8000
```
53

helloyongyang's avatar
helloyongyang committed
54
## Advanced Quantization Features
55

helloyongyang's avatar
helloyongyang committed
56
Refer to the quantization tool [LLMC documentation](https://github.com/ModelTC/llmc/blob/main/docs/en/source/backend/lightx2v.md) for details.