Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
703475f6
Unverified
Commit
703475f6
authored
Jun 14, 2024
by
Tyler Michael Smith
Committed by
GitHub
Jun 14, 2024
Browse files
[Kernel] Fix CUTLASS 3.x custom broadcast load epilogue (#5516)
parent
d47af2bc
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
2 additions
and
4 deletions
+2
-4
csrc/quantization/cutlass_w8a8/broadcast_load_epilogue_c3x.hpp
...quantization/cutlass_w8a8/broadcast_load_epilogue_c3x.hpp
+1
-1
vllm/model_executor/layers/quantization/fp8.py
vllm/model_executor/layers/quantization/fp8.py
+1
-3
No files found.
csrc/quantization/cutlass_w8a8/broadcast_load_epilogue_c3x.hpp
View file @
703475f6
...
...
@@ -153,7 +153,7 @@ struct Sm90RowOrScalarBroadcast {
CUTLASS_DEVICE
void
begin
(
uint64_t
*
full_mbarrier_ptr
,
int
load_iteration
,
bool
issue_tma_load
)
{
if
(
params
.
ptr_row
==
nullptr
)
{
if
(
!
params
.
row_broadcast
)
{
return
;
}
...
...
vllm/model_executor/layers/quantization/fp8.py
View file @
703475f6
...
...
@@ -257,9 +257,7 @@ class Fp8LinearMethod(LinearMethodBase):
# If dynamic, layer.input_scale is None and x_scale computed from x.
# If static, layer.input_scale is scalar and x_scale is input_scale.
# Temporarily disable CUTLASS kernels due to an illegal memory access
#if bias is None and self.cutlass_fp8_supported:
if
False
:
if
bias
is
None
and
self
.
cutlass_fp8_supported
:
qinput
,
x_scale
=
ops
.
scaled_fp8_quant
(
x
,
layer
.
input_scale
)
# Fused GEMM_DQ
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment