- Your NVIDIA GPU(s) must be of Compute Capability 7.5. Turing and later architectures are supported.
- Your CUDA version must be CUDA 11.8 or later.
- AMD:
- Your ROCm version must be ROCm 5.6 or later.
### Install from PyPi
...
...
@@ -42,13 +44,21 @@ To install the newest AutoAWQ from PyPi, you need CUDA 12.1 installed.
pip install autoawq
```
If you cannot use CUDA 12.1, you can still use CUDA 11.8 and install the wheel from the [latest release](https://github.com/casper-hansen/AutoAWQ/releases).
### Build from source
For CUDA 11.8, ROCm 5.6, and ROCm 5.7, you can install wheels from the [release page](https://github.com/casper-hansen/AutoAWQ/releases/latest):
All three methods will install the latest and correct kernels for your system from [AutoAWQ_Kernels](https://github.com/casper-hansen/AutoAWQ_kernels/releases).
If your system is not supported (i.e. not on the release page), you can build the kernels yourself by following the instructions in [AutoAWQ_Kernels](https://github.com/casper-hansen/AutoAWQ_kernels/releases) and then install AutoAWQ from source.
## Supported models
The detailed support list:
| Models | Sizes |
| ---------| ----------------------------|
| --------| ---------------------------|
| LLaMA-2 | 7B/13B/70B |
| LLaMA | 7B/13B/30B/65B |
| Mistral | 7B |
...
...
@@ -196,27 +210,27 @@ These benchmarks showcase the speed and memory usage of processing context (pref
- 🟢 for GEMV, 🔵 for GEMM, 🔴 for avoid using
| Model Name | Size | Version | Batch Size | Prefill Length | Decode Length | Prefill tokens/s | Decode tokens/s | Memory (VRAM) |