"docs/git@developer.sourcefind.cn:OpenDAS/opencompass.git" did not exist on "d3b6112b2bed26e093c61b0f3acf4d63368f59ca"
Commit 4da2d9da authored by Jing Zhang's avatar Jing Zhang
Browse files

fixed readme

parents 6d6dc6bf 898e40e7
# Instructions for ```reduce_blockwise``` Example # Instructions for ```pool2d_fwd``` Example
## Docker script ## Docker script
```bash ```bash
...@@ -13,7 +13,7 @@ rocm/tensorflow:rocm4.3.1-tf2.6-dev \ ...@@ -13,7 +13,7 @@ rocm/tensorflow:rocm4.3.1-tf2.6-dev \
/bin/bash /bin/bash
``` ```
## Build ```reduce_blockwise``` ## Build ```pool2d_fwd```
```bash ```bash
mkdir build && cd build mkdir build && cd build
``` ```
...@@ -30,31 +30,26 @@ cmake \ ...@@ -30,31 +30,26 @@ cmake \
``` ```
```bash ```bash
make -j reduce_blockwise make -j pool2d_fwd
``` ```
## Run ```reduce_blockwise``` ## Run ```pool2d_fwd```
```bash ```bash
# -D <xxx> : input 4-d tensor lengths #arg1: verification (0=no, 1=yes)
# -v <x> : verification (0=no, 1=yes) #arg2: initialization (0=no init, 1=integer value, 2=decimal value)
#arg1: initialization (0=no init, 1=integer value, 2=decimal value) #arg3: run kernel # of times (>1)
#arg2: run kernel # of times (>1) #arg4 to 15: N, C, Y, X, Hi, Wi, Sy, Sx, LeftPy, LeftPx, RightPy, RightPx
./bin/reduce_blockwise -D 16,64,32,960 -v 1 1 10 ./example/pool2d_fwd 1 1 10
``` ```
Result Result
``` ```
launch_and_time_kernel: grid_dim {240, 1, 1}, block_dim {256, 1, 1} in_n_c_hi_wi: dim 4, lengths {128, 192, 71, 71}, strides {967872, 1, 13632, 192}
Warm up out_n_c_ho_wo: dim 4, lengths {128, 192, 36, 36}, strides {248832, 1, 6912, 192}
Start running 3 times... launch_and_time_kernel: grid_dim {124416, 1, 1}, block_dim {64, 1, 1}
Perf: 0.23536 ms, 267.32 GB/s, DeviceReduceBlockWise<256,M_C4_S1,K_C64_S1,InSrcVectorDim_0_InSrcVectorSize_1_OutDstVectorSize_1>
error: 0
max_diff: 0, 529, 529
root@dc-smc-18:/data/composable_kernel/Build3# bin/reduce_blockwise -D 16,64,32,960 -v 1 1 10
launch_and_time_kernel: grid_dim {240, 1, 1}, block_dim {256, 1, 1}
Warm up Warm up
Start running 10 times... Start running 10 times...
Perf: 0.23392 ms, 268.966 GB/s, DeviceReduceBlockWise<256,M_C4_S1,K_C64_S1,InSrcVectorDim_0_InSrcVectorSize_1_OutDstVectorSize_1> Perf: 0.415453 ms, 1.37996 TFlops, 749.726 GB/s
error: 0 error: 0
max_diff: 0, 528, 528 max_diff: 0, 1, 1
``` ```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment