Unverified Commit bd27ed6c authored by Chao Liu's avatar Chao Liu Committed by GitHub
Browse files

Merge pull request #2 from asroy/master

Update readme
parents 594f1dbe 85a14293
...@@ -18,9 +18,6 @@ https://www.boost.org/doc/libs/1_66_0/more/getting_started/unix-variants.html#ea ...@@ -18,9 +18,6 @@ https://www.boost.org/doc/libs/1_66_0/more/getting_started/unix-variants.html#ea
# Build # Build
Change target ID in source code, example below is gfx908
https://github.com/asroy/modular_convolution/blob/aafb5eb18781f1ac9e06a17c3e53d968dd53dcc0/composable_kernel/include/utility/config.amd.hpp.in#L16-L23
Add path of Boost Add path of Boost
``` ```
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
...@@ -28,12 +25,14 @@ Add path of Boost ...@@ -28,12 +25,14 @@ Add path of Boost
``` ```
mkdir build && cd build mkdir build && cd build
```
# need to manually set target ID, example below is gfx908 cmake cmd. Need to Specify target ID, example below is gfx908
```
cmake \ cmake \
-D CMAKE_BUILD_TYPE=Release \ -D CMAKE_BUILD_TYPE=Release \
-D DEVICE_BACKEND=AMD \ -D CMAKE_CXX_FLAGS="-DCK_AMD_GPU_GFX908 -O3 --amdgpu-target=gfx908 -mllvm --amdgpu-spill-vgpr-to-agpr=0 -gline-tables-only -save-temps=$PWD" \
-D CMAKE_CXX_FLAGS="-O3 --amdgpu-target=gfx908 -mllvm --amdgpu-spill-vgpr-to-agpr=0 -gline-tables-only -save-temps=$CWD" \ -D HIP_ONLINE_COMPILER_FLAGS="-DCK_AMD_GPU_GFX908" \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \ -D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-D CMAKE_PREFIX_PATH=/opt/rocm \ -D CMAKE_PREFIX_PATH=/opt/rocm \
-D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \ -D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \
...@@ -41,37 +40,35 @@ cmake ...@@ -41,37 +40,35 @@ cmake
``` ```
Build drivers: \ Build drivers: \
``conv_driver_v2`` is (offline compilation) driver for forward convolution, \ ``conv_fwd_driver_offline`` is (offline compilation) driver for forward convolution, \
``conv_bwd_data_driver_v2`` is (offline compilation) driver for backward-data convolution \ ``conv_bwd_driver_offline`` is (offline compilation) driver for backward-data convolution \
``conv_driver_v2_olc`` is (online compilation) driver for forward convolution ``conv_fwd_driver_online`` is (online compilation) driver for forward convolution
``` ```
make -j conv_driver_v2 make -j conv_fwd_driver_offline
make -j conv_bwd_data_driver_v2 make -j conv_bwd_driver_offline
make -j conv_driver_v2_olc make -j conv_fwd_driver_online
``` ```
# Run # Run
* layout: 0 = NCHW; 1 = NHWC * layout: 0 = NCHW; 1 = NHWC
* algo: * algo: algorithm
* Forward convolution: https://github.com/asroy/modular_convolution/blob/aafb5eb18781f1ac9e06a17c3e53d968dd53dcc0/driver/conv_driver_v2.cpp#L38
* Backward data convolution: https://github.com/asroy/modular_convolution/blob/aafb5eb18781f1ac9e06a17c3e53d968dd53dcc0/driver/conv_bwd_data_driver_v2.cpp#L22
* verify: 0 = no verification; 1 = do verification * verify: 0 = no verification; 1 = do verification
* init: 0 ~ 3. initialization method * init: 0 ~ 5. initialization method
* log: 0 = no log; 1 = do log * log: 0 = no log; 1 = do log
* repeat: number of time kernel being launched * repeat: number of time kernel being launched
``` ```
########################### layout algo verify init log repeat N__ K___ C___ Y X Hi_ Wi__ Strides Dilations LeftPads RightPads ######################################################## layout algo verify init log repeat N__ K___ C___ Y X Hi_ Wi__ Strides Dilations LeftPads RightPads
./conv_driver_v2 0 6 0 3 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1 ./host/driver_offline/conv_fwd_driver_offline 0 4 0 0 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1
./conv_driver_v2 0 6 0 3 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1 ./host/driver_offline/conv_fwd_driver_offline 0 4 0 0 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1
./conv_driver_v2 1 9 0 3 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1 ./host/driver_offline/conv_fwd_driver_offline 1 5 0 0 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1
./conv_driver_v2 1 9 0 3 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1 ./host/driver_offline/conv_fwd_driver_offline 1 5 0 0 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1
./conv_bwd_data_driver_v2 1 1 0 3 0 1 256 256 1024 3 3 14 14 1 1 1 1 1 1 1 1 ./host/driver_offline/conv_bwd_driver_offline 1 5 0 0 0 1 256 256 1024 3 3 14 14 1 1 1 1 1 1 1 1
``` ```
# Result # Result
Forward convoltuion, FP16, NCHW Forward convoltuion, FP16, NCHW
``` ```
./conv_driver_v2 0 6 0 3 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1 ./host/driver_offline/conv_fwd_driver_offline 0 4 0 0 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1
layout: 0 layout: 0
in: dim 4, lengths {128, 192, 71, 71}, strides {967872, 5041, 71, 1} in: dim 4, lengths {128, 192, 71, 71}, strides {967872, 5041, 71, 1}
...@@ -93,7 +90,7 @@ Average time : 1.4155 ms, 103.686 TFlop/s ...@@ -93,7 +90,7 @@ Average time : 1.4155 ms, 103.686 TFlop/s
Forward convoltuion, FP16, NCHW Forward convoltuion, FP16, NCHW
``` ```
./conv_driver_v2 0 6 0 3 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1 ./host/driver_offline/conv_fwd_driver_offline 0 4 0 0 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1
layout: 0 layout: 0
in: dim 4, lengths {256, 256, 14, 14}, strides {50176, 196, 14, 1} in: dim 4, lengths {256, 256, 14, 14}, strides {50176, 196, 14, 1}
...@@ -115,7 +112,7 @@ Average time : 2.21357 ms, 106.959 TFlop/s ...@@ -115,7 +112,7 @@ Average time : 2.21357 ms, 106.959 TFlop/s
Forward convolution, FP16, NHWC Forward convolution, FP16, NHWC
``` ```
./conv_driver_v2 1 9 0 3 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1 ./host/driver_offline/conv_fwd_driver_offline 1 5 0 0 0 1 128 256 192 3 3 71 71 2 2 1 1 1 1 1 1
layout: 1 layout: 1
in: dim 4, lengths {128, 71, 71, 192}, strides {967872, 13632, 192, 1} in: dim 4, lengths {128, 71, 71, 192}, strides {967872, 13632, 192, 1}
...@@ -137,7 +134,7 @@ Average time : 1.12014 ms, 131.025 TFlop/s ...@@ -137,7 +134,7 @@ Average time : 1.12014 ms, 131.025 TFlop/s
Forward convolution, FP16, NHWC Forward convolution, FP16, NHWC
``` ```
./conv_driver_v2 1 9 0 3 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1 ./host/driver_offline/conv_fwd_driver_offline 1 5 0 0 0 1 256 1024 256 3 3 14 14 1 1 1 1 1 1 1 1
layout: 1 layout: 1
in: dim 4, lengths {256, 14, 14, 256}, strides {50176, 3584, 256, 1} in: dim 4, lengths {256, 14, 14, 256}, strides {50176, 3584, 256, 1}
...@@ -159,7 +156,7 @@ Average time : 1.86877 ms, 126.693 TFlop/s ...@@ -159,7 +156,7 @@ Average time : 1.86877 ms, 126.693 TFlop/s
Backward data convolution, FP16, NHWC Backward data convolution, FP16, NHWC
``` ```
./conv_bwd_data_driver_v2 1 1 0 3 0 1 256 256 1024 3 3 14 14 1 1 1 1 1 1 1 1 ./host/driver_offline/conv_bwd_driver_offline 1 1 0 3 0 1 256 256 1024 3 3 14 14 1 1 1 1 1 1 1 1
layout: 1 layout: 1
in: dim 4, lengths {256, 14, 14, 1024}, strides {200704, 14336, 1024, 1} in: dim 4, lengths {256, 14, 14, 1024}, strides {200704, 14336, 1024, 1}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment