Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
composable_kernel
Commits
4dfcf974
Commit
4dfcf974
authored
Sep 26, 2022
by
Astha Rai
Browse files
fixed indexing for loop step
parent
88d5d8d0
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
2 deletions
+3
-2
include/ck/tensor_operation/gpu/grid/gridwise_elementwise_2d.hpp
.../ck/tensor_operation/gpu/grid/gridwise_elementwise_2d.hpp
+3
-2
No files found.
include/ck/tensor_operation/gpu/grid/gridwise_elementwise_2d.hpp
View file @
4dfcf974
...
@@ -104,14 +104,15 @@ struct GridwiseElementwise_2D
...
@@ -104,14 +104,15 @@ struct GridwiseElementwise_2D
const
index_t
blockSize
=
get_block_size
();
const
index_t
blockSize
=
get_block_size
();
const
index_t
blockPerGrid_m
=
get_grid_size
();
const
index_t
blockPerGrid_m
=
get_grid_size
();
const
index_t
blockPerGrid_n
=
gridDim
.
y
;
const
index_t
blockPerGrid_n
=
gridDim
.
y
;
const
index_t
block_1d
=
get_block_1d_id
();
const
auto
M
=
in_grid_2d_desc_tuple
[
I0
].
GetLength
(
I0
);
const
auto
M
=
in_grid_2d_desc_tuple
[
I0
].
GetLength
(
I0
);
const
auto
N
=
in_grid_2d_desc_tuple
[
I1
].
GetLength
(
I1
);
const
auto
N
=
in_grid_2d_desc_tuple
[
I1
].
GetLength
(
I1
);
const
index_t
loop_step_m
=
blockPerGrid_m
*
blockSize
*
MPerThread
;
const
index_t
loop_step_m
=
blockPerGrid_m
*
blockSize
*
MPerThread
;
const
index_t
loop_step_n
=
blockPerGrid_n
*
blockSize
*
NPerThread
;
const
index_t
loop_step_n
=
blockPerGrid_n
*
blockSize
*
NPerThread
;
const
auto
loop_step_index
=
make_multi_index
(
loop_step_m
,
loop_step_n
);
const
auto
loop_step_index
=
make_multi_index
(
loop_step_m
,
loop_step_n
);
const
auto
index_t
thread_global_id_2d
=
const
auto
thread_global_id_2d
=
thread_buffer_desc_mn
.
CalculateBottomIndex
(
make_multi_index
(
get_
block_1d
_id
));
thread_buffer_desc_mn
.
CalculateBottomIndex
(
make_multi_index
(
block_1d
));
const
auto
blockId_m
=
thread_global_id_2d
[
I0
];
const
auto
blockId_m
=
thread_global_id_2d
[
I0
];
const
auto
blockId_n
=
thread_global_id_2d
[
I1
];
const
auto
blockId_n
=
thread_global_id_2d
[
I1
];
const
auto
thread_global_offset
=
const
auto
thread_global_offset
=
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment