Skip to content

Commit f7912cb

Browse files
authored
[Doc] Add top anchor and a note to quantization/bitblas.md (#17042)
Signed-off-by: windsonsea <[email protected]>
1 parent 6317a51 commit f7912cb

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

docs/source/features/quantization/bitblas.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,15 @@
1+
(bitblas)=
2+
13
# BitBLAS
24

35
vLLM now supports [BitBLAS](https://github.com/microsoft/BitBLAS) for more efficient and flexible model inference. Compared to other quantization frameworks, BitBLAS provides more precision combinations.
46

7+
:::{note}
8+
Ensure your hardware supports the selected `dtype` (`torch.bfloat16` or `torch.float16`).
9+
Most recent NVIDIA GPUs support `float16`, while `bfloat16` is more common on newer architectures like Ampere or Hopper.
10+
For details see [supported hardware](https://docs.vllm.ai/en/latest/features/quantization/supported_hardware.html).
11+
:::
12+
513
Below are the steps to utilize BitBLAS with vLLM.
614

715
```console

0 commit comments

Comments
 (0)