Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
136d750f
Unverified
Commit
136d750f
authored
Jul 25, 2025
by
czhu-cohere
Committed by
GitHub
Jul 25, 2025
Browse files
[Kernel] Improve machete memory bound perf (#21556)
Signed-off-by:
czhu-cohere
<
conway.zhu@cohere.com
>
parent
b3caeb82
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
2 deletions
+6
-2
csrc/quantization/machete/machete_prepacked_layout.cuh
csrc/quantization/machete/machete_prepacked_layout.cuh
+6
-2
No files found.
csrc/quantization/machete/machete_prepacked_layout.cuh
View file @
136d750f
...
@@ -187,8 +187,12 @@ struct PrepackedLayoutBTemplate {
...
@@ -187,8 +187,12 @@ struct PrepackedLayoutBTemplate {
CUTE_HOST_DEVICE
static
constexpr
auto
TVbNbKL_to_offset_copy
(
CUTE_HOST_DEVICE
static
constexpr
auto
TVbNbKL_to_offset_copy
(
Shape_NKL
shape_mkl
)
{
Shape_NKL
shape_mkl
)
{
auto
layout
=
TVbNbKL_to_offset
(
shape_mkl
);
auto
layout
=
TVbNbKL_to_offset
(
shape_mkl
);
return
make_layout
(
coalesce
(
get
<
0
>
(
layout
)),
get
<
1
>
(
layout
),
// for 4-bit elements, having >= 64 values per column
get
<
2
>
(
layout
));
// allows TMA to load full 32-byte sectors
auto
inner_layout
=
make_layout
(
make_shape
(
_256
{},
size
<
0
>
(
layout
)
/
_256
{}));
return
make_layout
(
inner_layout
,
get
<
1
>
(
layout
),
get
<
2
>
(
layout
));
}
}
// ((BlockN, BlockK), (BlocksN, BlocksK), L) -> (storage_idx)
// ((BlockN, BlockK), (BlocksN, BlocksK), L) -> (storage_idx)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment