quantization.md 1.43 KB
Newer Older
helloyongyang's avatar
helloyongyang committed
1
# Model Quantization
2

helloyongyang's avatar
helloyongyang committed
3
lightx2v supports quantized inference for linear layers in **Dit**, enabling `w8a8-int8`, `w8a8-fp8`, `w8a8-fp8block`, `w8a8-mxfp8` and `w4a4-nvfp4` matrix multiplication.
4
5


helloyongyang's avatar
helloyongyang committed
6
## Generating Quantized Models
7

helloyongyang's avatar
helloyongyang committed
8
### Offline Quantization
9

helloyongyang's avatar
helloyongyang committed
10
11
12
13
lightx2v also supports direct loading of pre-quantized weights. For offline model quantization, refer to the [documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme.md).
Configure the [quantization file](https://github.com/ModelTC/lightx2v/tree/main/configs/quantization/wan_i2v_quant_offline.json):
1. Set `dit_quantized_ckpt` to the converted weight path
2. Set `weight_auto_quant` to `false` in `mm_type`
14
15


helloyongyang's avatar
helloyongyang committed
16
## Quantized Inference
17

helloyongyang's avatar
helloyongyang committed
18
### Automatic Quantization
19
20
21
```shell
bash scripts/run_wan_i2v_quant_auto.sh
```
helloyongyang's avatar
helloyongyang committed
22
23

### Offline Quantization
24
25
```shell
bash scripts/run_wan_i2v_quant_offline.sh
helloyongyang's avatar
helloyongyang committed
26

27
28
```

helloyongyang's avatar
helloyongyang committed
29
30
## Launching Quantization Service

31

helloyongyang's avatar
helloyongyang committed
32
After offline quantization, point `--config_json` to the offline quantization JSON file.
33

helloyongyang's avatar
helloyongyang committed
34
Example modification in `scripts/start_server.sh`:
35
36
37
38
39
40
41
42

```shell
export RUNNING_FLAG=infer

python -m lightx2v.api_server \
--model_cls wan2.1 \
--task t2v \
--model_path $model_path \
43
--config_json ${lightx2v_path}/configs/quantization/wan_i2v_quant_offline.json \
44
45
--port 8000
```
46

helloyongyang's avatar
helloyongyang committed
47
## Advanced Quantization Features
48

helloyongyang's avatar
helloyongyang committed
49
Refer to the quantization tool [LLMC documentation](https://github.com/ModelTC/llmc/blob/main/docs/en/source/backend/lightx2v.md) for details.