"docs/contributing/profiling.md" did not exist on "aba8d6ee006b78149ac4514f460e4038b2d4f607"
Unverified Commit f7912cba authored by Michael Yao's avatar Michael Yao Committed by GitHub
Browse files

[Doc] Add top anchor and a note to quantization/bitblas.md (#17042)


Signed-off-by: default avatarwindsonsea <haifeng.yao@daocloud.io>
parent 6317a517
(bitblas)=
# BitBLAS # BitBLAS
vLLM now supports [BitBLAS](https://github.com/microsoft/BitBLAS) for more efficient and flexible model inference. Compared to other quantization frameworks, BitBLAS provides more precision combinations. vLLM now supports [BitBLAS](https://github.com/microsoft/BitBLAS) for more efficient and flexible model inference. Compared to other quantization frameworks, BitBLAS provides more precision combinations.
:::{note}
Ensure your hardware supports the selected `dtype` (`torch.bfloat16` or `torch.float16`).
Most recent NVIDIA GPUs support `float16`, while `bfloat16` is more common on newer architectures like Ampere or Hopper.
For details see [supported hardware](https://docs.vllm.ai/en/latest/features/quantization/supported_hardware.html).
:::
Below are the steps to utilize BitBLAS with vLLM. Below are the steps to utilize BitBLAS with vLLM.
```console ```console
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment