Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
composable_kernel
Commits
4da2d9da
Commit
4da2d9da
authored
Mar 14, 2022
by
Jing Zhang
Browse files
fixed readme
parents
6d6dc6bf
898e40e7
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
15 additions
and
20 deletions
+15
-20
example/13_pool2d_fwd/README.md
example/13_pool2d_fwd/README.md
+15
-20
No files found.
example/13_pool2d_fwd/README.md
View file @
4da2d9da
# Instructions for ```
reduce_blockwise
``` Example
# Instructions for ```
pool2d_fwd
``` Example
## Docker script
```
bash
...
...
@@ -13,7 +13,7 @@ rocm/tensorflow:rocm4.3.1-tf2.6-dev \
/bin/bash
```
## Build ```
reduce_blockwise
```
## Build ```
pool2d_fwd
```
```
bash
mkdir
build
&&
cd
build
```
...
...
@@ -30,31 +30,26 @@ cmake \
```
```
bash
make
-j
reduce_blockwise
make
-j
pool2d_fwd
```
## Run ```
reduce_blockwise
```
## Run ```
pool2d_fwd
```
```
bash
#
-D <xxx> : input 4-d tensor lengths
#
-v <x> : verification (0=no, 1=yes
)
#arg
1
:
initialization (0=no init, 1=integer value, 2=decimal value
)
#arg
2: run kernel # of times (>1)
./
bin/reduce_blockwise
-D
16,64,32,960
-v
1 1 10
#
arg1: verification (0=no, 1=yes)
#
arg2: initialization (0=no init, 1=integer value, 2=decimal value
)
#arg
3
:
run kernel # of times (>1
)
#arg
4 to 15: N, C, Y, X, Hi, Wi, Sy, Sx, LeftPy, LeftPx, RightPy, RightPx
./
example/pool2d_fwd
1 1 10
```
Result
Result
```
launch_and_time_kernel: grid_dim {240, 1, 1}, block_dim {256, 1, 1}
Warm up
Start running 3 times...
Perf: 0.23536 ms, 267.32 GB/s, DeviceReduceBlockWise<256,M_C4_S1,K_C64_S1,InSrcVectorDim_0_InSrcVectorSize_1_OutDstVectorSize_1>
error: 0
max_diff: 0, 529, 529
root@dc-smc-18:/data/composable_kernel/Build3# bin/reduce_blockwise -D 16,64,32,960 -v 1 1 10
launch_and_time_kernel: grid_dim {240, 1, 1}, block_dim {256, 1, 1}
in_n_c_hi_wi: dim 4, lengths {128, 192, 71, 71}, strides {967872, 1, 13632, 192}
out_n_c_ho_wo: dim 4, lengths {128, 192, 36, 36}, strides {248832, 1, 6912, 192}
launch_and_time_kernel: grid_dim {124416, 1, 1}, block_dim {64, 1, 1}
Warm up
Start running 10 times...
Perf: 0.
23392 ms, 268.966 GB/s, DeviceReduceBlockWise<256,M_C4_S1,K_C64_S1,InSrcVectorDim_0_InSrcVectorSize_1_OutDstVectorSize_1>
Perf: 0.
415453 ms, 1.37996 TFlops, 749.726 GB/s
error: 0
max_diff: 0,
528, 528
max_diff: 0,
1, 1
```
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment