Unverified Commit 473ba5bc authored by Chao Liu's avatar Chao Liu Committed by GitHub
Browse files

update document: Readme, contributors, citation, (#463)

* update cmake script

* update readme

* Update README.md

* add citation

* add images

* Update README.md

* update

* Update README.md

* Update CONTRIBUTORS.md

* Update README.md

* Update CITATION.cff

* Update README.md

* Update CITATION.cff
parent 7fc3ed76
cff-version: 1.2.0
title: Composable Kernel
message: If you use this software, please cite using the following metadata.
type: software
authors:
- given-names: Chao
family-names: Liu
email: chao.liu2@amd.com
affiliation: AMD
- given-names: Jing
family-names: Zhang
email: jing.zhang3@amd.com
affiliation: AMD
- given-names: Letao
family-names: Qin
email: letao.qin@amd.com
affiliation: AMD
- given-names: Qianfeng
family-names: Zhang
email: qianfeng.zhang@amd.com
affiliation: AMD
- given-names: Liang
family-names: Huang
email: carlus.huang@amd.com
affiliation: AMD
- given-names: Shaojie
family-names: Wang
email: shaojie.wang@amd.com
affiliation: AMD
- given-names: Anthony
family-names: Chang
email: antc@amd.com
affiliation: AMD
- given-names: Chunyu
family-names: Lai
email: chunyu.lai@amd.com
affiliation: AMD
- given-names: Illia
family-names: Silin
email: illia.silin@amd.com
affiliation: AMD
- given-names: Adam
family-names: Osewski
email: adam.osewski@amd.com
affiliation: AMD
- given-names: Poyen
family-names: Chen
email: poyen.chen@amd.com
affiliation: AMD
- given-names: Rosty
family-names: Geyyer
email: rosty.geyyer@amd.com
affiliation: AMD
- given-names: Hanwen
family-names: Chen
- given-names: Tejash
family-names: Shah
- given-names: Xiaoyan
family-names: Zhou
- given-names: Jianfeng
family-names: Yan
repository-code: 'https://github.com/ROCmSoftwarePlatform/composable_kernel'
abstract: Composable Kernel (CK) library aims to provide a programming model for writing performance critical kernels for Machine Learning workloads across multiple architectures including GPUs, CPUs, etc, through general purpose kernel progarmming languages, like HIP C++.
keywords:
- 'CK, Composable Kernel, Tensor Coordinate Transformation'
license: MIT
license-url: https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/7fc3ed761aa35709d87c8fbbe41dd368648b3541/LICENSE
# Developers
[Chao Liu](https://github.com/asroy), [Jing Zhang](https://github.com/zjing14), 2018-2022
[Letao Qin](https://github.com/ltqin), [Qianfeng Zhang](https://github.com/qianfengz), [Liang Huang](https://github.com/carlushuang), [Shaojie Wang](https://github.com/shaojiewang), 2019-2022
[Anthony Chang](https://github.com/rosenrodt), [Chunyu Lai](https://github.com/rocking5566), [Illia Silin](https://github.com/illsilin), [Adam Osewski](https://github.com/aosewski), [Poyen Chen](https://github.com/poyenc), [Rosty Geyyer](https://github.com/geyyer), 2022
Hanwen Chang, 2019-2021,
Tejash Shah, 2019-2020
Xiaoyan Zhou, 2020
[Jianfeng Yan](https://github.com/j4yan), 2021-2022
# Product Manager
[Jun Liu](https://github.com/junliume)
# Contributors
[Dan Yao](https://github.com/danyao12), [Guangzhao Lu](https://github.com/guangzlu), [Raman Jana](https://github.com/ramjana), [Jehandad Khan](https://github.com/JehandadKhan)
# Acknowledgement
CK team works closely with Meta [AITemplate](???to.be.added???) team ([Bing Xu](https://github.com/antinucleon), Ying Zhang, etc). Most of the lucrative graph optimization opportunities in ML models were identified by AITemplate team, and we also co-designed many high performance fused kernels for AMD GPUs. Without this collaboration, CK would not reach its current potential.
## Docker script # Composable Kernel
## Methodology
Composable Kernel (CK) library aims to provide a programming model for writing performance critical kernels for Machine Learning workloads across multiple architectures including GPUs, CPUs, etc, through general purpose kernel languages, like HIP C++.
CK utilizes two concepts to achieve performance portabilatity and code maintainbility:
* A tile-based programming model
* Algorithm complexity reduction for complex ML operators, using innovative technique we call "Tensor Coordinate Transformation".
![ALT](/doc/image/ck_component.png "CK Components")
## Code Structure
Current CK library are structured into 4 layers:
* "Templated Tile Operators"
* "Templated Kernel and Invoker" layer
* "Instantiated Kernel and Invoker" layer
* "Client API" layer
![ALT](/doc/image/ck_layer.png "CK Layers")
## Contributors
The list of developers and contributors is here: [Contributors](/CONTRIBUTORS.md)
## Citation
If you use CK, please use following citations:
* CK paper will be freely available on arXiv soon: [Realizing Tensor Operators Using Coordinate Transformations and Tile Based Programming](???)
* [CITATION.cff](/CITATION.cff)
## License
CK is released under the MIT license. [License File](/LICENSE)
# Build CK
## Build docker image
```bash
DOCKER_BUILDKIT=1 docker build -t ck:latest -f Dockerfile .
```
## Launch docker
```bash ```bash
docker run \ docker run \
-it \ -it \
...@@ -6,47 +45,38 @@ docker run \ ...@@ -6,47 +45,38 @@ docker run \
--group-add sudo \ --group-add sudo \
-w /root/workspace \ -w /root/workspace \
-v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace \ -v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace \
rocm/tensorflow:rocm5.1-tf2.6-dev \ ck:latest \
/bin/bash /bin/bash
``` ```
# Install newer version of rocm-cmake ## Build CK
https://github.com/RadeonOpenCompute/rocm-cmake
## Build
```bash ```bash
mkdir build && cd build mkdir build && cd build
```
```bash # Need to specify target ID, example below is for gfx908 and gfx90a
# Need to specify target ID, example below is gfx908 and gfx90a cmake \
cmake \ -D CMAKE_PREFIX_PATH=/opt/rocm \
-D BUILD_DEV=OFF \ -D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-D CMAKE_BUILD_TYPE=Release \ -D CMAKE_CXX_FLAGS="-O3" \
-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3" \ -D CMAKE_BUILD_TYPE=Release \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \ -D GPU_TARGETS=gfx908;gfx90a \
-D CMAKE_PREFIX_PATH=/opt/rocm \
-D CMAKE_INSTALL_PREFIX=${PATH_TO_CK_INSTALL_DIRECTORY} \
.. ..
``` ```
### Build and Run Examples ### Build examples and tests
```bash
make -j examples
```
Instructions for running each individual examples are under ```example/```
## Tests
```bash ```bash
make -j examples tests make -j examples tests
make test make test
``` ```
Instructions for running each individual examples are under [example](/example)
## Build ckProfiler ## Build ckProfiler
```bash ```bash
make -j ckProfiler make -j ckProfiler
``` ```
Instructions for running ckProfiler are under ```profiler/``` Instructions for running ckProfiler are under [profiler](/profiler)
## Install CK ## Install CK
```bash ```bash
...@@ -54,7 +84,7 @@ make install ...@@ -54,7 +84,7 @@ make install
``` ```
## Using CK as pre-built kernel library ## Using CK as pre-built kernel library
Instructions for using CK as a pre-built kernel library are under ```client_example/``` Instructions for using CK as a pre-built kernel library are under [client_example](/client_example)
## Caveat ## Caveat
### Kernel Timing and Verification ### Kernel Timing and Verification
......
#!/bin/bash
rm -f CMakeCache.txt
rm -f *.cmake
rm -rf CMakeFiles
MY_PROJECT_SOURCE=$1
cmake \
-D CMAKE_PREFIX_PATH=/opt/rocm \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-D CMAKE_CXX_FLAGS="-O3 -ftemplate-backtrace-limit=0 -gline-tables-only -save-temps=$PWD" \
-D CMAKE_BUILD_TYPE=Release \
-D BUILD_DEV=ON \
-D GPU_TARGETS=gfx908;gfx90a \
-D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \
-D USE_BITINT_EXTENSION_INT4=OFF \
${MY_PROJECT_SOURCE}
#-D AMDGPU_TARGETS=gfx90a;gfx908
#!/bin/bash
rm -f CMakeCache.txt
rm -f *.cmake
rm -rf CMakeFiles
MY_PROJECT_SOURCE=$1
cmake \
-D CMAKE_PREFIX_PATH=/opt/rocm \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-D CMAKE_CXX_FLAGS="-O3" \
-D CMAKE_BUILD_TYPE=Release \
-D BUILD_DEV=OFF \
-D GPU_TARGETS=gfx908;gfx90a \
-D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \
-D USE_BITINT_EXTENSION_INT4=OFF \
${MY_PROJECT_SOURCE}
#-D AMDGPU_TARGETS=gfx90a;gfx908
#!/bin/bash
rm -f CMakeCache.txt
rm -f *.cmake
rm -rf CMakeFiles
MY_PROJECT_SOURCE=../
MY_PROJECT_INSTALL=../install.dir
cmake \
-D CMAKE_INSTALL_PREFIX=${MY_PROJECT_INSTALL} \
-D BUILD_DEV=OFF \
-D CMAKE_BUILD_TYPE=Release \
-D CMAKE_CXX_FLAGS=" -O3 -ftemplate-backtrace-limit=0 -gline-tables-only -save-temps=$PWD" \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-D CMAKE_PREFIX_PATH=/opt/rocm \
-D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \
${MY_PROJECT_SOURCE}
#-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3 -ftemplate-backtrace-limit=0 -mllvm --amdgpu-spill-vgpr-to-agpr=0 -gline-tables-only -save-temps=$PWD" \
#-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3 -ftemplate-backtrace-limit=0 -gline-tables-only -save-temps=$PWD" \
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment