Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
composable_kernel
Commits
898e40e7
Commit
898e40e7
authored
Mar 14, 2022
by
Jing Zhang
Browse files
readme
parent
3b16e0d1
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
32 deletions
+1
-32
example/13_pool2d_fwd/README.md
example/13_pool2d_fwd/README.md
+1
-32
No files found.
example/13_pool2d_fwd/README.md
View file @
898e40e7
<<<<<<< HEAD:example/13_pool2d_fwd/README.md
# Instructions for ```pool2d_fwd``` Example
=======
# Instructions for ```grouped_gemm_xdl``` Example
# Instructions for ```grouped_gemm_xdl``` Example
>>>>>>> 17f80fcf4bb6e6e17f26ec1550aa194b962c50d7:example/14_grouped_gemm/README.md
## Docker script
## Docker script
```
bash
```
bash
...
@@ -17,11 +13,7 @@ rocm/tensorflow:rocm4.3.1-tf2.6-dev \
...
@@ -17,11 +13,7 @@ rocm/tensorflow:rocm4.3.1-tf2.6-dev \
/bin/bash
/bin/bash
```
```
<<<<<<< HEAD:example/13_pool2d_fwd/README.md
## Build ```pool2d_fwd```
=======
## Build ```grouped_gemm_xdl```
## Build ```grouped_gemm_xdl```
>>>>>>> 17f80fcf4bb6e6e17f26ec1550aa194b962c50d7:example/14_grouped_gemm/README.md
```
bash
```
bash
mkdir
build
&&
cd
build
mkdir
build
&&
cd
build
```
```
...
@@ -38,41 +30,19 @@ cmake \
...
@@ -38,41 +30,19 @@ cmake \
```
```
```
bash
```
bash
<<<<<<
< HEAD:example/13_pool2d_fwd/README.md
make
-j
pool2d_fwd
```
## Run ```pool2d_fwd```
=======
make
-j
example_grouped_gemm_xdl_fp16
make
-j
example_grouped_gemm_xdl_fp16
```
```
## Run ```grouped_gemm_xdl```
## Run ```grouped_gemm_xdl```
>>>>>>> 17f80fcf4bb6e6e17f26ec1550aa194b962c50d7:example/14_grouped_gemm/README.md
```
bash
```
bash
#arg1: verification (0=no, 1=yes)
#arg1: verification (0=no, 1=yes)
#arg2: initialization (0=no init, 1=integer value, 2=decimal value)
#arg2: initialization (0=no init, 1=integer value, 2=decimal value)
#arg3: run kernel # of times (>1)
#arg3: run kernel # of times (>1)
<<<<<<< HEAD:example/13_pool2d_fwd/README.md
#arg4 to 15: N, C, Y, X, Hi, Wi, Sy, Sx, LeftPy, LeftPx, RightPy, RightPx
./example/pool2d_fwd 1 1 10
=======
./bin/example_grouped_gemm_xdl_fp16 0 1 5
./bin/example_grouped_gemm_xdl_fp16 0 1 5
>>>>>>> 17f80fcf4bb6e6e17f26ec1550aa194b962c50d7:example/14_grouped_gemm/README.md
```
```
Result
Result
(MI100 @ 1087Mhz, 133.5TFlops peak FP16)
```
```
<<<<<<< HEAD:example/13_pool2d_fwd/README.md
in_n_c_hi_wi: dim 4, lengths {128, 192, 71, 71}, strides {967872, 1, 13632, 192}
out_n_c_ho_wo: dim 4, lengths {128, 192, 36, 36}, strides {248832, 1, 6912, 192}
launch_and_time_kernel: grid_dim {124416, 1, 1}, block_dim {64, 1, 1}
Warm up
Start running 10 times...
Perf: 0.415453 ms, 1.37996 TFlops, 749.726 GB/s
error: 0
max_diff: 0, 1, 1
=======
gemm[0] a_m_k: dim 2, lengths {256, 64}, strides {64, 1} b_k_n: dim 2, lengths {64, 128}, strides {1, 64} c_m_n: dim 2, lengths {256, 128}, strides {128, 1}
gemm[0] a_m_k: dim 2, lengths {256, 64}, strides {64, 1} b_k_n: dim 2, lengths {64, 128}, strides {1, 64} c_m_n: dim 2, lengths {256, 128}, strides {128, 1}
gemm[1] a_m_k: dim 2, lengths {512, 128}, strides {128, 1} b_k_n: dim 2, lengths {128, 256}, strides {1, 128} c_m_n: dim 2, lengths {512, 256}, strides {256, 1}
gemm[1] a_m_k: dim 2, lengths {512, 128}, strides {128, 1} b_k_n: dim 2, lengths {128, 256}, strides {1, 128} c_m_n: dim 2, lengths {512, 256}, strides {256, 1}
gemm[2] a_m_k: dim 2, lengths {768, 192}, strides {192, 1} b_k_n: dim 2, lengths {192, 384}, strides {1, 192} c_m_n: dim 2, lengths {768, 384}, strides {384, 1}
gemm[2] a_m_k: dim 2, lengths {768, 192}, strides {192, 1} b_k_n: dim 2, lengths {192, 384}, strides {1, 192} c_m_n: dim 2, lengths {768, 384}, strides {384, 1}
...
@@ -85,5 +55,4 @@ launch_and_time_kernel: grid_dim {30, 1, 1}, block_dim {256, 1, 1}
...
@@ -85,5 +55,4 @@ launch_and_time_kernel: grid_dim {30, 1, 1}, block_dim {256, 1, 1}
Warm up
Warm up
Start running 5 times...
Start running 5 times...
Perf: 0.037887 ms, 11.0706 TFlops, 90.8132 GB/s, DeviceGroupedGemmXdl<256, 256, 128, 4, 8, 32, 32, 4, 2>
Perf: 0.037887 ms, 11.0706 TFlops, 90.8132 GB/s, DeviceGroupedGemmXdl<256, 256, 128, 4, 8, 32, 32, 4, 2>
>>>>>>> 17f80fcf4bb6e6e17f26ec1550aa194b962c50d7:example/14_grouped_gemm/README.md
```
```
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment