Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
composable_kernel_ROCM
Commits
2e53f972
"vscode:/vscode.git/clone" did not exist on "dfa2f32ca07dbb6586bf06dd12c82a04c1fee79d"
Commit
2e53f972
authored
Feb 10, 2025
by
coderfeli
Browse files
skip empty expert
parent
e21f36fc
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
1 deletion
+4
-1
include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3_multi_d_b_preshuffle.hpp
...id/gridwise_gemm_xdl_cshuffle_v3_multi_d_b_preshuffle.hpp
+4
-1
No files found.
include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v3_multi_d_b_preshuffle.hpp
View file @
2e53f972
...
...
@@ -1127,6 +1127,9 @@ struct GridwiseGemmMultiD_xdl_cshuffle_v3_b_preshuffle
__builtin_amdgcn_readfirstlane
(
block_m_id
*
MPerBlock
);
const
index_t
expert_stride
=
__builtin_amdgcn_readfirstlane
(
problem
.
N
*
problem
.
K
);
const
index_t
t0
=
(
p_sorted_token_ids
[
block_m_id
*
MPerBlock
]
&
0xffffff
);
if
(
t0
>=
problem
.
NumTokens
)
return
;
// N0, K0, Blocksize*KPack
const
index_t
n_block_data_idx_on_grid
=
__builtin_amdgcn_readfirstlane
(
block_n_id
*
NXdlPerWave
);
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment