Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
composable_kernel_ROCM
Commits
f41a43a1
Commit
f41a43a1
authored
Jan 27, 2025
by
Jiming Ruan
Browse files
Fix bug in non fuse_add_store cases
parent
df45a6b5
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
16 additions
and
14 deletions
+16
-14
include/ck_tile/ops/rmsnorm2d/pipeline/rmsnorm2d_fwd_pipeline_two_pass.hpp
...ps/rmsnorm2d/pipeline/rmsnorm2d_fwd_pipeline_two_pass.hpp
+16
-14
No files found.
include/ck_tile/ops/rmsnorm2d/pipeline/rmsnorm2d_fwd_pipeline_two_pass.hpp
View file @
f41a43a1
...
@@ -153,24 +153,26 @@ struct Rmsnorm2dFwdPipelineTwoPass
...
@@ -153,24 +153,26 @@ struct Rmsnorm2dFwdPipelineTwoPass
{
{
auto
acc
=
make_static_distributed_tensor
<
ComputeDataType
>
(
decltype
(
load_tile
(
x_window
))
::
get_tile_distribution
());
auto
acc
=
make_static_distributed_tensor
<
ComputeDataType
>
(
decltype
(
load_tile
(
x_window
))
::
get_tile_distribution
());
if
constexpr
(
kFusedAdd
==
Rmsnorm2dFusedAddEnum
::
PRE_ADD
)
if
constexpr
(
kFusedAdd
==
Rmsnorm2dFusedAddEnum
::
PRE_ADD_STORE
)
{
auto
x
=
load_tile
(
x_window
);
auto
x_resi
=
load_tile
(
x_residual_window
);
sweep_tile
(
x_resi
,
[
&
](
auto
idx
)
{
// compute x = x_resi + x
acc
(
idx
)
=
type_convert
<
ComputeDataType
>
(
x_resi
(
idx
))
+
acc
(
idx
);
});
move_tile_window
(
x_window
,
{
0
,
-
Block_N
});
move_tile_window
(
x_residual_window
,
{
0
,
-
Block_N
});
}
else
if
constexpr
(
kFusedAdd
==
Rmsnorm2dFusedAddEnum
::
PRE_ADD_STORE
)
{
{
acc
=
cast_tile
<
ComputeDataType
>
(
load_tile
(
y_residual_window
));
acc
=
cast_tile
<
ComputeDataType
>
(
load_tile
(
y_residual_window
));
move_tile_window
(
y_residual_window
,
{
0
,
-
Block_N
});
move_tile_window
(
y_residual_window
,
{
0
,
-
Block_N
});
}
}
else
{
acc
=
cast_tile
<
ComputeDataType
>
(
load_tile
(
x_window
));
move_tile_window
(
x_window
,
{
0
,
-
Block_N
});
if
constexpr
(
kFusedAdd
==
Rmsnorm2dFusedAddEnum
::
PRE_ADD
)
{
auto
x_resi
=
load_tile
(
x_residual_window
);
sweep_tile
(
x_resi
,
[
&
](
auto
idx
)
{
// compute x = x_resi + x
acc
(
idx
)
=
type_convert
<
ComputeDataType
>
(
x_resi
(
idx
))
+
acc
(
idx
);
});
move_tile_window
(
x_residual_window
,
{
0
,
-
Block_N
});
}
}
// load gamma (TODO: support no gamma?)
// load gamma (TODO: support no gamma?)
const
auto
gamma
=
load_tile
(
gamma_window
);
const
auto
gamma
=
load_tile
(
gamma_window
);
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment