This folder contains example for GEMM using ck_tile tile-programming implementation. Currently, it only supports the basic feature of the CK Tile GEMM, but creates the placeholders for the future support on different GEMM pipeline and different GEMM modules. In the near future, we will gradually migrate all the GEMM features from old CK to CK Tile.
This folder contains example for batched GEMM using ck_tile tile-programming implementation.
## build
## build
```
```
...
@@ -8,20 +8,23 @@ This folder contains example for GEMM using ck_tile tile-programming implementat
...
@@ -8,20 +8,23 @@ This folder contains example for GEMM using ck_tile tile-programming implementat
mkdir build && cd build
mkdir build && cd build
# you can replace <arch> with the appropriate architecture (for example gfx90a or gfx942) or leave it blank
# you can replace <arch> with the appropriate architecture (for example gfx90a or gfx942) or leave it blank
sh ../script/cmake-ck-dev.sh ../ <arch>
sh ../script/cmake-ck-dev.sh ../ <arch>
make tile_example_gemm_basic -j
make tile_example_batched_gemm -j
```
```
This will result in an executable `build/bin/tile_example_gemm_basic`
This will result in an executable `build/bin/tile_example_batched_gemm`
## example
## example
```
```
args:
args:
-b batch size (default:1)
-m m dimension (default:256)
-m m dimension (default:1024)
-n n dimension (default:128)
-n n dimension (default:2048)
-k k dimension (default:128)
-k k dimension (default:64)
-stride_a Tensor A stride (default:128)
-stride_a Tensor A stride (default:0)
-stride_b Tensor B stride (default:128)
-stride_b Tensor B stride (default:0)
-stride_c Tensor C stride (default:128)
-stride_c Tensor C stride (default:0)
-batch_stride_a Batch A stride (default:32768)
-batch_stride_b Batch B stride (default:16384)
-batch_stride_c Batch C stride (default:32768)
-batch_count Batch count (default:16)
-v 0. No validation, 1. Validation on CPU, 2. Validation on GPU (default:2)
-v 0. No validation, 1. Validation on CPU, 2. Validation on GPU (default:2)