"...git@developer.sourcefind.cn:2222/OpenDAS/vllm_cscc.git" did not exist on "4e04eceb58288310932d4abfbb417f1415d05caf"
Unverified Commit f68e3ea4 authored by Jinwu's avatar Jinwu Committed by GitHub
Browse files

[BugFix] Add synchronize in CutlassW4A8LinearKernel to ensure data is ready for use. (#33078)


Co-authored-by: default avatarjinwuguo <jinwuguo@tencent.com>
Co-authored-by: default avatarWentao Ye <44945378+yewentao256@users.noreply.github.com>
parent d5c41db3
...@@ -77,6 +77,7 @@ class CutlassW4A8LinearKernel(MPLinearKernel): ...@@ -77,6 +77,7 @@ class CutlassW4A8LinearKernel(MPLinearKernel):
def transform_w_q(x): def transform_w_q(x):
assert isinstance(x, BasevLLMParameter) assert isinstance(x, BasevLLMParameter)
convert_packed_uint4b8_to_signed_int4_inplace(x.data) convert_packed_uint4b8_to_signed_int4_inplace(x.data)
torch.cuda.synchronize()
permute_param_layout_(x, input_dim=0, output_dim=1, packed_dim=0) permute_param_layout_(x, input_dim=0, output_dim=1, packed_dim=0)
x.data = ops.cutlass_encode_and_reorder_int4b(x.data.t().contiguous().t()) x.data = ops.cutlass_encode_and_reorder_int4b(x.data.t().contiguous().t())
return x return x
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment