Commit fb11c945 authored by Chao Liu's avatar Chao Liu
Browse files

change min block per cu to 1

parent 9a17e7fb
# Instructions for ```gemm_xdl``` Example # Instructions
## Docker script ## Docker script
```bash ```bash
...@@ -13,7 +13,7 @@ rocm/tensorflow:rocm4.3.1-tf2.6-dev \ ...@@ -13,7 +13,7 @@ rocm/tensorflow:rocm4.3.1-tf2.6-dev \
/bin/bash /bin/bash
``` ```
## Build ```gemm_xdl``` ## Build Example
```bash ```bash
mkdir build && cd build mkdir build && cd build
``` ```
...@@ -30,15 +30,15 @@ cmake \ ...@@ -30,15 +30,15 @@ cmake \
``` ```
```bash ```bash
make -j gemm_xdl make -j example_gemm_xdl_fp16
``` ```
## Run ```gemm_xdl``` ## Run Example
```bash ```bash
#arg1: verification (0=no, 1=yes) #arg1: verification (0=no, 1=yes)
#arg2: initialization (0=no init, 1=integer value, 2=decimal value) #arg2: initialization (0=no init, 1=integer value, 2=decimal value)
#arg3: run kernel # of times (>1) #arg3: run kernel # of times (>1)
./example/gemm_xdl 0 1 5 ./bin/example_gemm_xdl_fp16 0 1 5
``` ```
Result (MI100 @ 1087Mhz, 133.5TFlops peak FP16) Result (MI100 @ 1087Mhz, 133.5TFlops peak FP16)
......
# Instructions for ```conv2d_fwd_xdl``` Example # Instructions
## Docker script ## Docker script
```bash ```bash
...@@ -13,7 +13,7 @@ rocm/tensorflow:rocm4.3.1-tf2.6-dev \ ...@@ -13,7 +13,7 @@ rocm/tensorflow:rocm4.3.1-tf2.6-dev \
/bin/bash /bin/bash
``` ```
## Build ```conv2d_fwd_xdl``` ## Build Example
```bash ```bash
mkdir build && cd build mkdir build && cd build
``` ```
...@@ -30,16 +30,16 @@ cmake \ ...@@ -30,16 +30,16 @@ cmake \
``` ```
```bash ```bash
make -j conv2d_fwd_xdl make -j example_conv2d_fwd_xdl_fp16
``` ```
## Run ```conv2d_fwd_xdl``` ## Run Example
```bash ```bash
#arg1: verification (0=no, 1=yes) #arg1: verification (0=no, 1=yes)
#arg2: initialization (0=no init, 1=integer value, 2=decimal value) #arg2: initialization (0=no init, 1=integer value, 2=decimal value)
#arg3: run kernel # of times (>1) #arg3: run kernel # of times (>1)
#arg4 to 18: N, K, C, Y, X, Hi, Wi, Sy, Sx, Dy, Dx, LeftPy, LeftPx, RightPy, RightPx #arg4 to 18: N, K, C, Y, X, Hi, Wi, Sy, Sx, Dy, Dx, LeftPy, LeftPx, RightPy, RightPx
./example/conv2d_fwd_xdl 0 1 5 ./bin/example_conv2d_fwd_xdl_fp16 0 1 5
``` ```
Result (MI100 @ 1087Mhz, 133.5TFlops peak FP16) Result (MI100 @ 1087Mhz, 133.5TFlops peak FP16)
......
...@@ -21,7 +21,7 @@ ...@@ -21,7 +21,7 @@
#ifdef CK_USE_LAUNCH_BOUNDS #ifdef CK_USE_LAUNCH_BOUNDS
#define CK_MAX_THREAD_PER_BLOCK 256 #define CK_MAX_THREAD_PER_BLOCK 256
#define CK_MIN_BLOCK_PER_CU 2 #define CK_MIN_BLOCK_PER_CU 1
#endif #endif
// GPU-specific parameters // GPU-specific parameters
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment