Commit f149c921 authored by Bartlomiej Kocot's avatar Bartlomiej Kocot
Browse files

Make new mesaurements for correct bytes calculation

parent 0ee5aa3a
...@@ -240,7 +240,7 @@ struct DeviceImageToColumnImpl ...@@ -240,7 +240,7 @@ struct DeviceImageToColumnImpl
const auto block_2_tile_map = const auto block_2_tile_map =
BlockToCTileMap_M00_N0_M01Adapt<MPerBlock, KPerBlock, OutputGridDesc>( BlockToCTileMap_M00_N0_M01Adapt<MPerBlock, KPerBlock, OutputGridDesc>(
arg.out_grid_desc_m_k_, I1 /*M01*/); arg.out_grid_desc_m_k_);
const index_t grid_size = block_2_tile_map.CalculateGridSize(arg.out_grid_desc_m_k_); const index_t grid_size = block_2_tile_map.CalculateGridSize(arg.out_grid_desc_m_k_);
const auto kernel = kernel_image_to_column<InputGridDesc, const auto kernel = kernel_image_to_column<InputGridDesc,
InputDataType, InputDataType,
......
...@@ -212,14 +212,13 @@ Note: This kernel use atomic add, this will cause output buffer to be accumulate ...@@ -212,14 +212,13 @@ Note: This kernel use atomic add, this will cause output buffer to be accumulate
``` ```
Result (MI250, FP32, NHWC) Result (MI210, FP32, NHWC)
``` ```
input: dim 5, lengths {1, 256, 512, 28, 28}, strides {102760448, 401408, 1, 14336, 512} input: dim 5, lengths {1, 256, 512, 28, 28}, strides {102760448, 401408, 1, 14336, 512}
output: dim 2, lengths {173056, 4608}, strides {4608, 1} output: dim 2, lengths {173056, 4608}, strides {4608, 1}
.... ....
Best configuration parameters: Best configuration parameters:
name: DeviceImageToColumn<256, 64, 64, 4> name: DeviceImageToColumn<128, 32, 64, 4>
avg_time: 3.19792 avg_time: 3.12326
tflops: 0 GB/s: 2042.59
GB/s: 1125.99
``` ```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment