# Quantization lightx2v supports quantized inference for linear layers, supporting w8a8 and fp8 matrix multiplication. ### Run Quantized Inference ```shell # Modify the path in the script bash scripts/run_wan_t2v_save_quant.sh ``` There are two execution commands in the script: #### Save Quantization Weights Set the `RUNNING_FLAG` environment variable to `save_naive_quant`, and set `--config_json` to the corresponding `json` file: `${lightx2v_path}/configs/wan_t2v_save_quant.json`. In this file, `quant_model_path` specifies the path to save the quantized model. #### Load Quantization Weights and Inference Set the `RUNNING_FLAG` environment variable to `infer`, and set `--config_json` to the `json` file from the previous step. ### Start Quantization Service After saving the quantized weights, as in the previous loading step, set the `RUNNING_FLAG` environment variable to `infer`, and set `--config_json` to the `json` file from the first step. For example, modify the `scripts/start_server.sh` script as follows: ```shell export RUNNING_FLAG=infer python -m lightx2v.api_server \ --model_cls wan2.1 \ --task t2v \ --model_path $model_path \ --config_json ${lightx2v_path}/configs/wan_t2v_save_quant.json \ --port 8000 ```