Commits · gelu · gaoqiong / composable_kernel

18 Jun, 2022 5 commits
- clean element wise op · fe32b124
  Chao Liu authored Jun 18, 2022
  
  fe32b124
- clean · d1335c43
  Chao Liu authored Jun 18, 2022
  
  d1335c43
- use type_convert · 53fd4f42
  Chao Liu authored Jun 18, 2022
  
  53fd4f42
- add comment · d7206533
  Chao Liu authored Jun 18, 2022
  
  d7206533
- Merge remote-tracking branch 'origin/develop' into gelu · 6805df0e
  Chao Liu authored Jun 18, 2022
  
  6805df0e
17 Jun, 2022 5 commits

Don't look up the /sys/module/amdgpu/version file. (#287) · e4584d91

Illia Silin authored Jun 17, 2022



* use pre-built docker instead of building a new one

* try docker.image.pull

* change syntax in docker.image()

* add 30 min timeout

* increase timeout to 3 hours

* move performance tests to first stage for testing

* set image variable to the new container name

* update image name

* check available images

* check available images in both places

* try different image name

* use image ID to refer to image

* run performance on gfx90a

* fix the gpu_arch labeling, add parameter

* move env vars out of stages

* add stand-alone performance script, MI200 tests, CU numbers

* dos2unix for run_perf_tests.sh

* try the new git credentials

* use env var for git credentials

* don't look up /sys/module/amdgpu/version
Co-authored-by: Chao Liu <chao.liu2@amd.com>

e4584d91

Regulate reduction accumulator operations and Element-wise operations (#274) · 1f543bfa

Qianfeng authored Jun 18, 2022

* Remove template from Reducton operation classes and add template to their operator() and GetIdentityValue() interfaces

* Change to unary elementwise operators and the reduce_unary_operator (class for mapping) and dependent variations in all host layers

* Remove the data type template parameter from reduce_binary_operator (class for mapping) and dependent variations in host layers

* Add InMemoryDataOperatonSupportedOnDataType to check the matching between data type and InMemoryDataOperation

* Use struct-scope operator template instantiation for binary and unary element-wise operations

* Change a few more elementwise operations to use template for operator()

* Tiny correction in Normalize operator

* Add static_assert to check the data type appliability for some reduction accumulator and element-wise operatons

* Correction in some examples with regard to using ReduceAccDataType

* Use static_assert for UnaryDivide

* Update to merged codes to use Element-wise operations and Reduction Accumulator operations correctly

* Tiny fix with regard to SetWorkSpacePointer()

1f543bfa

use universal workspace pointer in bwd-weight (#286) · 63cdd923
Shaojie WANG authored Jun 18, 2022

63cdd923
add p_workspace to baseargument (#275) · c7a96ed5
ltqin authored Jun 17, 2022

c7a96ed5

Gemm + bias + relu + add + layernorm (#272) · 6eb55499

rocking5566 authored Jun 17, 2022

* Copy "gemm reduce" to "gemm bias add reduce"

* Implement gemm bias add reduction

* Fix compiler error due to merge from develop

* Add tensor operation for gemm + bias + add + reduce

* Add gemm_bais_add_reduce to ckProfiler

* Add c1 functor

* Refine type

* Use reduceAccDataType instead of explicitly float

* Change to use check_err()

* Do relu in float32 instead of bhalf_t. Because bhalf_t is unsigned

* Refactor relu. using type_trait instead of overloading

* Rename DxsReduceAccElementwiseOperation to DxsReduceAccElementwiseOperation

* Fix denominator

* Refine nameing

* Fix denominator  in host

* Remove useless include header

* Use AccDataType

* Fix static_cast order

* Refine type

* [What] Remove tuple type in the base class
[Why] External api depend on base class. if base class has relationship with type, we will need many class for different type

6eb55499

16 Jun, 2022 3 commits

example for convnd bwd weight bf16 splitk (#265) · 561ec12f

Shaojie WANG authored Jun 17, 2022

* add GetWorkSpaceSize to base arg and make an example on convnd_bwd_weight

* add bwd weight for bf16: init

* remove redundant compute

* use datatype and split k to check whether a workspace is used

* remove unused computation for work space size

* add some code for bfp16

* add device/grid unary op

* add unary type convert to bwd-weight example

* support bf16 splitk kernel for convnd bwd weight

* 1. remove comments. 2. add checkvalidity. 3. add gridsize computation

* add workspace size check

* fix format

* change function name

561ec12f

Merge remote-tracking branch 'origin/develop' into gelu · 1fdbe3fe
Chao Liu authored Jun 16, 2022

1fdbe3fe

Use new github credentials (#278) · fb9b6b1e

Illia Silin authored Jun 15, 2022

* use pre-built docker instead of building a new one

* try docker.image.pull

* change syntax in docker.image()

* add 30 min timeout

* increase timeout to 3 hours

* move performance tests to first stage for testing

* set image variable to the new container name

* update image name

* check available images

* check available images in both places

* try different image name

* use image ID to refer to image

* run performance on gfx90a

* fix the gpu_arch labeling, add parameter

* move env vars out of stages

* add stand-alone performance script, MI200 tests, CU numbers

* dos2unix for run_perf_tests.sh

* try the new git credentials

* use env var for git credentials

fb9b6b1e

15 Jun, 2022 7 commits
- clean · 35a67b90
  Chao Liu authored Jun 15, 2022
  
  35a67b90
- clean · 82837d1b
  Chao Liu authored Jun 15, 2022
  
  82837d1b
- clean · 5d87cb7a
  Chao Liu authored Jun 15, 2022
  
  5d87cb7a
- clean · e5f731cb
  Chao Liu authored Jun 15, 2022
  
  e5f731cb
- add ckProfiler · b58b98ff
  Chao Liu authored Jun 15, 2022
  
  b58b98ff
- update example · 3d005816
  Chao Liu authored Jun 15, 2022
  
  3d005816
- rename · 9551101e
  Chao Liu authored Jun 15, 2022
  
  9551101e
14 Jun, 2022 5 commits
- update example · c4f12088
  Chao Liu authored Jun 14, 2022
  
  c4f12088
- Merge remote-tracking branch 'origin/develop' into gelu · 67fcb0bd
  Chao Liu authored Jun 14, 2022
  
  67fcb0bd
- update example · 578ffb6b
  Chao Liu authored Jun 14, 2022
  
  578ffb6b
- fix · 5816a647
  Chao Liu authored Jun 14, 2022
  
  5816a647
- fix · ad11d2a4
  Chao Liu authored Jun 14, 2022
  
  ad11d2a4
13 Jun, 2022 7 commits
- refactor · 2488d0bf
  Chao Liu authored Jun 13, 2022
  
  2488d0bf
- refactor · 97ec23bf
  Chao Liu authored Jun 13, 2022
  
  97ec23bf
- refactor · 7fd5e9f5
  Chao Liu authored Jun 13, 2022
  
  7fd5e9f5
- refactor · e09f6e02
  Chao Liu authored Jun 13, 2022
  
  e09f6e02
- refactor · 57271814
  Chao Liu authored Jun 13, 2022
  
  57271814
- refactor · f9b92b1e
  Chao Liu authored Jun 13, 2022
  
  f9b92b1e
- refactoring; add readme · ff4f8ba8
  Chao Liu authored Jun 13, 2022
  
  ff4f8ba8
11 Jun, 2022 2 commits
- rename, clean · 25e35b59
  Chao Liu authored Jun 11, 2022
  
  25e35b59
- add gemm bias add fastgelu · 8a60a329
  Chao Liu authored Jun 11, 2022
  
  8a60a329
10 Jun, 2022 1 commit

Add performance tests on MI200 in CI, reporting number of CUs, add stand-alone perf test. (#277) · 1ced00a5

Illia Silin authored Jun 10, 2022

* use pre-built docker instead of building a new one

* try docker.image.pull

* change syntax in docker.image()

* add 30 min timeout

* increase timeout to 3 hours

* move performance tests to first stage for testing

* set image variable to the new container name

* update image name

* check available images

* check available images in both places

* try different image name

* use image ID to refer to image

* run performance on gfx90a

* fix the gpu_arch labeling, add parameter

* move env vars out of stages

* add stand-alone performance script, MI200 tests, CU numbers

1ced00a5

09 Jun, 2022 2 commits
- debugging · c7d59414
  Chao Liu authored Jun 09, 2022
  
  c7d59414
- adding gemm multiple d · ea3feee5
  Chao Liu authored Jun 09, 2022
  
  ea3feee5
06 Jun, 2022 1 commit
- clean · 512666ff
  Chao Liu authored Jun 06, 2022
  
  512666ff
02 Jun, 2022 2 commits

Adding Resnet50 test to Performance tests (#268) · 1677cf70

Illia Silin authored Jun 02, 2022

* add resnet50 test to performance tests

* add blanks before gpu_arch in log files

* add resnet50 test with N=4 and process its results

* add ROCM and HIP versions to test tables

* uncomment the sql queries

* fix script syntax in jenkinsfile

1677cf70

use old ctile to avoid conv2d fwd bias relu add compute error (#271) · 1c5d06f2
Shaojie WANG authored Jun 03, 2022

1c5d06f2