"vllm/vscode:/vscode.git/clone" did not exist on "30e754390c2a8a7198f472386d35ee1ec9443e4a"
index.md 290 Bytes
Newer Older
1
2
3
4
5
6
(quantization-index)=

# Quantization

Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.

7
:::{toctree}
8
9
10
11
12
13
14
:caption: Contents
:maxdepth: 1

supported_hardware
auto_awq
bnb
gguf
15
int4
16
17
int8
fp8
18
quantized_kvcache
19
:::