Quantization can effectively reduce memory and bandwidth usage, accelerate computation and improve
throughput while with minimal accuracy loss. vLLM can leverage [Quark](https://quark.docs.amd.com/latest/),
the flexible and powerful quantization toolkit, to produce performant quantized models to run on AMD GPUs. Quark has specialized support for quantizing large language models with weight,
activation and kv-cache quantization and cutting-edge quantization algorithms like
AWQ, GPTQ, Rotation and SmoothQuant.
## Quark Installation
Before quantizing models, you need to install Quark. The latest release of Quark can be installed with pip:
```console
pip install amd-quark
```
You can refer to [Quark installation guide](https://quark.docs.amd.com/latest/install.html)
for more installation details.
## Quantization Process
After installing Quark, we will use an example to illustrate how to use Quark.
The Quark quantization process can be listed for 5 steps as below: