"vscode:/vscode.git/clone" did not exist on "bd2b52fc2dd09b401991835c8a2a6f2ef940b2e4"
bitblas.md 1.7 KB
Newer Older
1
2
3
---
title: BitBLAS
---
4
5
6

vLLM now supports [BitBLAS](https://github.com/microsoft/BitBLAS) for more efficient and flexible model inference. Compared to other quantization frameworks, BitBLAS provides more precision combinations.

7
8
9
10
!!! note
    Ensure your hardware supports the selected `dtype` (`torch.bfloat16` or `torch.float16`).
    Most recent NVIDIA GPUs support `float16`, while `bfloat16` is more common on newer architectures like Ampere or Hopper.
    For details see [supported hardware](https://docs.vllm.ai/en/latest/features/quantization/supported_hardware.html).
11

12
13
Below are the steps to utilize BitBLAS with vLLM.

14
```bash
15
16
17
18
19
20
21
pip install bitblas>=0.1.0
```

vLLM reads the model's config file and supports pre-quantized checkpoints.

You can find pre-quantized models on:

22
23
- [Hugging Face (BitBLAS)](https://huggingface.co/models?search=bitblas)
- [Hugging Face (GPTQ)](https://huggingface.co/models?search=gptq)
24
25
26
27
28
29
30
31
32
33
34

Usually, these repositories have a `quantize_config.json` file that includes a `quantization_config` section.

## Read bitblas format checkpoint

```python
from vllm import LLM
import torch

# "hxbgsyxh/llama-13b-4bit-g-1-bitblas" is a pre-quantized checkpoint.
model_id = "hxbgsyxh/llama-13b-4bit-g-1-bitblas"
Reid's avatar
Reid committed
35
36
37
38
39
40
llm = LLM(
    model=model_id,
    dtype=torch.bfloat16,
    trust_remote_code=True,
    quantization="bitblas"
)
41
42
43
44
```

## Read gptq format checkpoint

45
??? code
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

    ```python
    from vllm import LLM
    import torch

    # "hxbgsyxh/llama-13b-4bit-g-1" is a pre-quantized checkpoint.
    model_id = "hxbgsyxh/llama-13b-4bit-g-1"
    llm = LLM(
        model=model_id,
        dtype=torch.float16,
        trust_remote_code=True,
        quantization="bitblas",
        max_model_len=1024
    )
    ```