Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
abdfcd4f
Unverified
Commit
abdfcd4f
authored
Sep 18, 2025
by
Elvir Crnčević
Committed by
GitHub
Sep 18, 2025
Browse files
silu-v1: Fix EPS not being used during max-reduction (#25069)
Signed-off-by:
elvircrn
<
elvircrn@gmail.com
>
parent
4f02b77d
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
2 deletions
+1
-2
csrc/quantization/activation_kernels.cu
csrc/quantization/activation_kernels.cu
+1
-2
No files found.
csrc/quantization/activation_kernels.cu
View file @
abdfcd4f
...
...
@@ -365,7 +365,6 @@ __global__ void silu_mul_fp8_quant_deep_gemm_kernel(
int32_t
compute_pipeline_offset_64
=
0
;
for
(
int32_t
t
=
n_tokens_lower
;
t
<
n_tokens_upper
;
++
t
)
{
__nv_bfloat16
y_max_bf16
=
EPS
;
__nv_bfloat162
results_bf162
[
2
];
cp_async_wait
<
NUM_STAGES
-
2
>
();
...
...
@@ -405,7 +404,7 @@ __global__ void silu_mul_fp8_quant_deep_gemm_kernel(
auto
_y_max2
=
__hmax2
(
__habs2
(
results_bf162
[
0
]),
__habs2
(
results_bf162
[
1
]));
y_max_bf16
=
__hmax
(
_y_max2
.
x
,
_y_max2
.
y
);
__nv_bfloat16
y_max_bf16
=
__hmax
(
EPS
,
__hmax
(
_y_max2
.
x
,
_y_max2
.
y
)
)
;
// An entire group is assigned to a single warp, so a simple warp reduce
// is used.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment