Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
composable_kernel
Commits
f149c921
Commit
f149c921
authored
Aug 30, 2023
by
Bartlomiej Kocot
Browse files
Make new mesaurements for correct bytes calculation
parent
0ee5aa3a
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
5 additions
and
6 deletions
+5
-6
include/ck/tensor_operation/gpu/device/impl/device_image_to_column_impl.hpp
...operation/gpu/device/impl/device_image_to_column_impl.hpp
+1
-1
profiler/README.md
profiler/README.md
+4
-5
No files found.
include/ck/tensor_operation/gpu/device/impl/device_image_to_column_impl.hpp
View file @
f149c921
...
@@ -240,7 +240,7 @@ struct DeviceImageToColumnImpl
...
@@ -240,7 +240,7 @@ struct DeviceImageToColumnImpl
const
auto
block_2_tile_map
=
const
auto
block_2_tile_map
=
BlockToCTileMap_M00_N0_M01Adapt
<
MPerBlock
,
KPerBlock
,
OutputGridDesc
>
(
BlockToCTileMap_M00_N0_M01Adapt
<
MPerBlock
,
KPerBlock
,
OutputGridDesc
>
(
arg
.
out_grid_desc_m_k_
,
I1
/*M01*/
);
arg
.
out_grid_desc_m_k_
);
const
index_t
grid_size
=
block_2_tile_map
.
CalculateGridSize
(
arg
.
out_grid_desc_m_k_
);
const
index_t
grid_size
=
block_2_tile_map
.
CalculateGridSize
(
arg
.
out_grid_desc_m_k_
);
const
auto
kernel
=
kernel_image_to_column
<
InputGridDesc
,
const
auto
kernel
=
kernel_image_to_column
<
InputGridDesc
,
InputDataType
,
InputDataType
,
...
...
profiler/README.md
View file @
f149c921
...
@@ -212,14 +212,13 @@ Note: This kernel use atomic add, this will cause output buffer to be accumulate
...
@@ -212,14 +212,13 @@ Note: This kernel use atomic add, this will cause output buffer to be accumulate
```
```
Result
(
MI2
5
0, FP32, NHWC
)
Result
(
MI2
1
0, FP32, NHWC
)
```
```
input: dim 5, lengths {1, 256, 512, 28, 28}, strides {102760448, 401408, 1, 14336, 512}
input: dim 5, lengths {1, 256, 512, 28, 28}, strides {102760448, 401408, 1, 14336, 512}
output: dim 2, lengths {173056, 4608}, strides {4608, 1}
output: dim 2, lengths {173056, 4608}, strides {4608, 1}
....
....
Best configuration parameters:
Best configuration parameters:
name: DeviceImageToColumn
<
256,
64,
64,
4
>
name: DeviceImageToColumn
<
128,
32,
64,
4
>
avg_time: 3.19792
avg_time: 3.12326
tflops: 0
GB/s: 2042.59
GB/s: 1125.99
```
```
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment