Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
63d82a77
"vscode:/vscode.git/clone" did not exist on "2f4ea85448c9e0891789afef8a79db1603229393"
Unverified
Commit
63d82a77
authored
Aug 15, 2025
by
Xiaoyu Zhang
Committed by
GitHub
Aug 14, 2025
Browse files
refine mxfp4 shuffling log (#9194)
parent
53dcc750
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
16 additions
and
2 deletions
+16
-2
python/sglang/srt/layers/quantization/mxfp4.py
python/sglang/srt/layers/quantization/mxfp4.py
+16
-2
No files found.
python/sglang/srt/layers/quantization/mxfp4.py
View file @
63d82a77
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Adapted from https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/mxfp4.py
from
__future__
import
annotations
from
__future__
import
annotations
...
@@ -209,6 +222,7 @@ class Mxfp4MoEMethod(FusedMoEMethodBase):
...
@@ -209,6 +222,7 @@ class Mxfp4MoEMethod(FusedMoEMethodBase):
super
().
__init__
()
super
().
__init__
()
self
.
prefix
=
prefix
self
.
topk_indices_dtype
=
None
self
.
topk_indices_dtype
=
None
self
.
use_triton_kernels
=
global_server_args_dict
[
"enable_triton_kernel_moe"
]
self
.
use_triton_kernels
=
global_server_args_dict
[
"enable_triton_kernel_moe"
]
self
.
with_bias
=
False
self
.
with_bias
=
False
...
@@ -332,7 +346,7 @@ class Mxfp4MoEMethod(FusedMoEMethodBase):
...
@@ -332,7 +346,7 @@ class Mxfp4MoEMethod(FusedMoEMethodBase):
if
self
.
use_flashinfer
:
if
self
.
use_flashinfer
:
log_info_on_rank0
(
log_info_on_rank0
(
logger
,
logger
,
"Shuffling MoE weights for FlashInfer MXFP4 moe kernel, it might take a while..."
,
f
"Shuffling MoE weights for FlashInfer MXFP4 moe kernel
(layer:
{
self
.
prefix
}
)
, it might take a while..."
,
)
)
layer
.
gemm1_alpha
=
Parameter
(
layer
.
gemm1_alpha
=
Parameter
(
torch
.
tensor
([
1.702
]
*
self
.
num_experts
,
dtype
=
torch
.
float32
).
cuda
(),
torch
.
tensor
([
1.702
]
*
self
.
num_experts
,
dtype
=
torch
.
float32
).
cuda
(),
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment