[Doc] Add top anchor and a note to quantization/bitblas.md (#17042)

Signed-off-by: windsonsea <haifeng.yao@daocloud.io>

[Doc] Add top anchor and a note to quantization/bitblas.md (#17042)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
f7912cba · Michael Yao · GitHub · 6317a517 · f7912cba
Unverified Commit f7912cba authored Apr 23, 2025 by Michael Yao Committed by GitHub Apr 23, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 8 additions and 0 deletions

docs/source/features/quantization/bitblas.md docs/source/features/quantization/bitblas.md +8 -0

No files found.
--- a/docs/source/features/quantization/bitblas.md
+++ b/docs/source/features/quantization/bitblas.md
+(bitblas)=
 # BitBLAS
 vLLM now supports [BitBLAS](https://github.com/microsoft/BitBLAS) for more efficient and flexible model inference. Compared to other quantization frameworks, BitBLAS provides more precision combinations.
+:::{note}
+Ensure your hardware supports the selected `dtype` (`torch.bfloat16` or `torch.float16`).
+Most recent NVIDIA GPUs support `float16`, while `bfloat16` is more common on newer architectures like Ampere or Hopper.
+For details see [supported hardware](https://docs.vllm.ai/en/latest/features/quantization/supported_hardware.html).
+:::
 Below are the steps to utilize BitBLAS with vLLM.
 ```console