You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/features/quantization/bitblas.md
+8Lines changed: 8 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,15 @@
1
+
(bitblas)=
2
+
1
3
# BitBLAS
2
4
3
5
vLLM now supports [BitBLAS](https://github.com/microsoft/BitBLAS) for more efficient and flexible model inference. Compared to other quantization frameworks, BitBLAS provides more precision combinations.
4
6
7
+
:::{note}
8
+
Ensure your hardware supports the selected `dtype` (`torch.bfloat16` or `torch.float16`).
9
+
Most recent NVIDIA GPUs support `float16`, while `bfloat16` is more common on newer architectures like Ampere or Hopper.
10
+
For details see [supported hardware](https://docs.vllm.ai/en/latest/features/quantization/supported_hardware.html).
0 commit comments