linear8bit.mdx 869 Bytes
Newer Older
1
2
# LLM.int8()
[LLM.int8()](https://hf.co/papers/2208.07339) is a quantization method that aims to make large language model inference more accessible without significant degradation. Unlike naive 8-bit quantization, which can result in loss of critical information and accuracy, LLM.int8() dynamically adapts to ensure sensitive components of the computation retain higher precision when needed. The key is to extract the outliers from the inputs and weights and multiply them in 16-bit. All other values are multiplied in 8-bit before being dequantized back to 16-bits. The outputs from the 16-bit and 8-bit multiplication are combined to produce the final output.
Steven Liu's avatar
Steven Liu committed
3

4
[Further Resources](../../explanations/resources#llm-int8)
Steven Liu's avatar
Steven Liu committed
5
6
7
8
9
10
11
12
13
14

## Linear8bitLt

[[autodoc]] bitsandbytes.nn.Linear8bitLt
    - __init__

## Int8Params

[[autodoc]] bitsandbytes.nn.Int8Params
    - __init__