supported_hardware.rst 2.57 KB
Newer Older
1
2
3
4
5
6
7
.. _supported_hardware_for_quantization:

Supported Hardware for Quantization Kernels
===========================================

The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
.. list-table::
   :header-rows: 1
   :widths: 20 8 8 8 8 8 8 8 8 8 8

   * - Implementation
     - Volta
     - Turing
     - Ampere
     - Ada
     - Hopper
     - AMD GPU
     - Intel GPU
     - x86 CPU
     - AWS Inferentia
     - Google TPU
   * - AWQ
     - ✗
     - ✅︎
     - ✅︎
     - ✅︎
     - ✅︎
     - ✗
     - ✗
31
     - ✅︎
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
     - ✗
     - ✗
   * - GPTQ
     - ✅︎
     - ✅︎
     - ✅︎
     - ✅︎
     - ✅︎
     - ✗
     - ✗
     - ✗
     - ✗
     - ✗
   * - Marlin (GPTQ/AWQ/FP8)
     - ✗
     - ✗
     - ✅︎
     - ✅︎
     - ✅︎
     - ✗
     - ✗
     - ✗
     - ✗
     - ✗
   * - INT8 (W8A8)
     - ✗
     - ✅︎
     - ✅︎
     - ✅︎
     - ✅︎
     - ✗
     - ✗
64
     - ✅︎
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
     - ✗
     - ✗
   * - FP8 (W8A8)
     - ✗
     - ✗
     - ✗
     - ✅︎
     - ✅︎
     - ✅︎
     - ✗
     - ✗
     - ✗
     - ✗
   * - AQLM
     - ✅︎
     - ✅︎
     - ✅︎
     - ✅︎
     - ✅︎
     - ✗
     - ✗
     - ✗
     - ✗
     - ✗
   * - bitsandbytes
     - ✅︎
     - ✅︎
     - ✅︎
     - ✅︎
     - ✅︎
     - ✗
     - ✗
     - ✗
     - ✗
     - ✗
   * - DeepSpeedFP
     - ✅︎
     - ✅︎
     - ✅︎
     - ✅︎
     - ✅︎
     - ✗
     - ✗
     - ✗
     - ✗
     - ✗
   * - GGUF
     - ✅︎
     - ✅︎
     - ✅︎
     - ✅︎
     - ✅︎
     - ✗
     - ✗
     - ✗
     - ✗
     - ✗
122
123
124
125
126

Notes:
^^^^^^

- Volta refers to SM 7.0, Turing to SM 7.5, Ampere to SM 8.0/8.6, Ada to SM 8.9, and Hopper to SM 9.0.
127
128
- "✅︎" indicates that the quantization method is supported on the specified hardware.
- "✗" indicates that the quantization method is not supported on the specified hardware.
129
130
131

Please note that this compatibility chart may be subject to change as vLLM continues to evolve and expand its support for different hardware platforms and quantization methods.

132
For the most up-to-date information on hardware support and quantization methods, please check the `quantization directory <https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/layers/quantization>`_ or consult with the vLLM development team.