Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
e38042d4
"vllm/vscode:/vscode.git/clone" did not exist on "37a7d5d74a9eddae3265bb1118efbb0f5ce10a93"
Unverified
Commit
e38042d4
authored
Jun 13, 2024
by
Tyler Michael Smith
Committed by
GitHub
Jun 13, 2024
Browse files
[Kernel] Disable CUTLASS kernels for fp8 (#5505)
parent
33e3b372
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
1 deletion
+3
-1
vllm/model_executor/layers/quantization/fp8.py
vllm/model_executor/layers/quantization/fp8.py
+3
-1
No files found.
vllm/model_executor/layers/quantization/fp8.py
View file @
e38042d4
...
@@ -257,7 +257,9 @@ class Fp8LinearMethod(LinearMethodBase):
...
@@ -257,7 +257,9 @@ class Fp8LinearMethod(LinearMethodBase):
# If dynamic, layer.input_scale is None and x_scale computed from x.
# If dynamic, layer.input_scale is None and x_scale computed from x.
# If static, layer.input_scale is scalar and x_scale is input_scale.
# If static, layer.input_scale is scalar and x_scale is input_scale.
if
bias
is
None
and
self
.
cutlass_fp8_supported
:
# Temporarily disable CUTLASS kernels due to an illegal memory access
#if bias is None and self.cutlass_fp8_supported:
if
False
:
qinput
,
x_scale
=
ops
.
scaled_fp8_quant
(
x
,
layer
.
input_scale
)
qinput
,
x_scale
=
ops
.
scaled_fp8_quant
(
x
,
layer
.
input_scale
)
# Fused GEMM_DQ
# Fused GEMM_DQ
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment