Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
c8fd97f2
Unverified
Commit
c8fd97f2
authored
Jul 15, 2024
by
Tyler Michael Smith
Committed by
GitHub
Jul 15, 2024
Browse files
[Kernel] Use CUTLASS kernels for the FP8 layers with Bias (#6270)
parent
94b82e8c
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
2 deletions
+3
-2
vllm/model_executor/layers/quantization/utils/w8a8_utils.py
vllm/model_executor/layers/quantization/utils/w8a8_utils.py
+3
-2
No files found.
vllm/model_executor/layers/quantization/utils/w8a8_utils.py
View file @
c8fd97f2
...
...
@@ -112,7 +112,7 @@ def apply_fp8_linear(
# If dynamic, layer.input_scale is None and x_scale computed from x.
# If static, layer.input_scale is scalar and x_scale is input_scale.
if
bias
is
None
and
cutlass_fp8_supported
:
if
cutlass_fp8_supported
:
qinput
,
x_scale
=
ops
.
scaled_fp8_quant
(
input
,
input_scale
)
# Fused GEMM_DQ
...
...
@@ -120,7 +120,8 @@ def apply_fp8_linear(
weight
,
out_dtype
=
input
.
dtype
,
scale_a
=
x_scale
,
scale_b
=
weight_scale
)
scale_b
=
weight_scale
,
bias
=
bias
)
else
:
qinput
,
x_scale
=
ops
.
scaled_fp8_quant
(
input
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment