Unverified Commit 77d24c4b authored by Wentao Ye's avatar Wentao Ye Committed by GitHub
Browse files

[Bug] Fix fp8 deepgemm batch invariant (#37718)


Signed-off-by: default avataryewentao256 <zhyanwentao@126.com>
parent b3e84601
...@@ -305,6 +305,11 @@ def _flashinfer_fp8_blockscale_gemm_impl( ...@@ -305,6 +305,11 @@ def _flashinfer_fp8_blockscale_gemm_impl(
) )
return output return output
from vllm.model_executor.layers.batch_invariant import vllm_is_batch_invariant
if vllm_is_batch_invariant():
return run_deepgemm(input, weight, weight_scale)
condition = input.shape[0] < 32 condition = input.shape[0] < 32
# PyTorch's torch.compile cannot handle input-dependent control flow in standard # PyTorch's torch.compile cannot handle input-dependent control flow in standard
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment