Unverified Commit 64862d10 authored by Aleksandr Malyshev's avatar Aleksandr Malyshev Committed by GitHub
Browse files

[ROCM][AMD][TRITON] Halving warps number for fw_prefill to reduce spilling (#12713)


Signed-off-by: default avatarAleksandr Malyshev <maleksan@amd.com>
Co-authored-by: default avatarAleksandr Malyshev <maleksan@amd.com>
parent b3a0d01e
...@@ -11,7 +11,7 @@ from vllm.platforms import current_platform ...@@ -11,7 +11,7 @@ from vllm.platforms import current_platform
# Static kernels parameters # Static kernels parameters
BASE_BLOCK = 128 if current_platform.has_device_capability(80) else 64 BASE_BLOCK = 128 if current_platform.has_device_capability(80) else 64
NUM_WARPS = 8 NUM_WARPS = 4 if current_platform.is_rocm() else 8
# To check compatibility # To check compatibility
IS_TURING = current_platform.get_device_capability() == (7, 5) IS_TURING = current_platform.get_device_capability() == (7, 5)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment