Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
composable_kernel
Commits
ecd5f7c9
Commit
ecd5f7c9
authored
Nov 25, 2022
by
Anthony Chang
Browse files
comment LDS bank conflict considerations
parent
b1e544e2
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
19 additions
and
0 deletions
+19
-0
library/src/tensor_operation_instance/gpu/gemm/device_gemm_xdl_c_shuffle_f16_f16_f16_km_kn_mn_instance.cpp
...vice_gemm_xdl_c_shuffle_f16_f16_f16_km_kn_mn_instance.cpp
+19
-0
No files found.
library/src/tensor_operation_instance/gpu/gemm/device_gemm_xdl_c_shuffle_f16_f16_f16_km_kn_mn_instance.cpp
View file @
ecd5f7c9
...
@@ -11,6 +11,25 @@
...
@@ -11,6 +11,25 @@
#include "ck/library/tensor_operation_instance/add_device_operation_instance.hpp"
#include "ck/library/tensor_operation_instance/add_device_operation_instance.hpp"
/*
For fp16 M-contigous matrix of size M_K, each thread reads 4x2 tile (2 * 64bits) from the global
memory, transposes the 4x2 tile inside register, and writes into LDS in K0_M_K1 layout. This allows
us to use 128-bit LDS write instruction. This also avoids write bank conflicts because two
vertically connected 4x2 tiles is a contiguous chunk of memory if modeled as K0_M_K1 layout where
K1=2.
<- K1 -> <- K1 -> <- K1 ->
_________ _________ _________
| | 0 | 4 | transpose | 0 - 1 | to LDS | 0 - 1 |
| | 1 | 5 | ---> | 2 - 3 | ----> | 2 - 3 |
| | 2 | 6 | | 4 - 5 | | 4 - 5 |
M | | 3 | 7 | | 6 - 7 | | 6 - 7 |
| --------- --------- ---------
| | ... | | ... | | ... |
v --------- --------- ---------
VMEM VGPR LDS
*/
namespace
ck
{
namespace
ck
{
namespace
tensor_operation
{
namespace
tensor_operation
{
namespace
device
{
namespace
device
{
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment