index.md 290 Bytes
Newer Older
1
2
3
4
5
6
(quantization-index)=

# Quantization

Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.

7
:::{toctree}
8
9
10
11
12
13
14
:caption: Contents
:maxdepth: 1

supported_hardware
auto_awq
bnb
gguf
15
int4
16
17
int8
fp8
18
quantized_kvcache
19
:::