Commit f149c921 authored by Bartlomiej Kocot's avatar Bartlomiej Kocot
Browse files

Make new mesaurements for correct bytes calculation

parent 0ee5aa3a
......@@ -240,7 +240,7 @@ struct DeviceImageToColumnImpl
const auto block_2_tile_map =
BlockToCTileMap_M00_N0_M01Adapt<MPerBlock, KPerBlock, OutputGridDesc>(
arg.out_grid_desc_m_k_, I1 /*M01*/);
arg.out_grid_desc_m_k_);
const index_t grid_size = block_2_tile_map.CalculateGridSize(arg.out_grid_desc_m_k_);
const auto kernel = kernel_image_to_column<InputGridDesc,
InputDataType,
......
......@@ -212,14 +212,13 @@ Note: This kernel use atomic add, this will cause output buffer to be accumulate
```
Result (MI250, FP32, NHWC)
Result (MI210, FP32, NHWC)
```
input: dim 5, lengths {1, 256, 512, 28, 28}, strides {102760448, 401408, 1, 14336, 512}
output: dim 2, lengths {173056, 4608}, strides {4608, 1}
....
Best configuration parameters:
name: DeviceImageToColumn<256, 64, 64, 4>
avg_time: 3.19792
tflops: 0
GB/s: 1125.99
name: DeviceImageToColumn<128, 32, 64, 4>
avg_time: 3.12326
GB/s: 2042.59
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment