Commits · feature/restruct-ckprofiler · gaoqiong / composable_kernel

01 Dec, 2022 1 commit
- Merge branch 'develop' into feature/restruct-ckprofiler · 00af2988
  Po-Yen, Chen authored Dec 01, 2022
  
  00af2988
30 Nov, 2022 3 commits

gemm, conv perchannel quantization (#503) · ad541ad6

rocking5566 authored Dec 01, 2022

* Use gemm_multiple_D instead

* Add gemm bias relu quantization example

* Add pure gemm quantization example

* Add quantization of perchannel conv + bias + relu example

* Refine the code

* Rename multiplier to requant_scale

* Rename the folder

* Remove redundant comment

* Rename the file. Prepare to add perchannel

* Add conv perchannel instance

* Move to quantization folder

* Add conv perchannel client example

* Apply Rangify constructor of HostTensorDescriptor & Tensor<>

* Fix merge error

ad541ad6

BatchNorm backward instance/external API/profiler/tests (#519) · 63af525c

Qianfeng authored Dec 01, 2022

* Refine the device batchnorm-backward base API templates and data type assignments

* Remove duplicated kernel file

* Add batchnorm backward instances and external API

* Add batchnorm-backward profiler and tests

* Add client example which uses batchnorm backward external API

* Merge test/batchnorm_fwd and test/batchnorm_bwd into one directory

* Loose the threshold for batchnorm-backward check_err()

63af525c

Merge branch 'develop' into feature/restruct-ckprofiler · 9a2607d6
Po-Yen, Chen authored Nov 30, 2022

9a2607d6

29 Nov, 2022 3 commits

Fix split-k gemm test (#231) · 236bd148

Anthony Chang authored Nov 30, 2022



* properly return error flag; reveals bug in split-k gemm

* fix bug in split k

* update split-k test case
Co-authored-by: Chao Liu <chao.liu2@amd.com>

236bd148

fix GetTypeString · 0e9c88ce
fsx950223 authored Nov 16, 2022

0e9c88ce

BatchNorm backward implementation (#461) · 44789d99

Qianfeng authored Nov 29, 2022

* Implemented batchnorm-backward Blockwise and Multiblock kernels

* Add batchnorm-backward device op

* Add batchnorm-backward host-reference op

* Add batchnorm-backward example

* Parameters renaming in batchnorm backward kernels and device op

* Change in the example to loose the threshold for ScaleDiff checking

* Add comments to explain the implementation of batchnorm-backward

* Parameters renaming again in batchnorm backward kernels

* Improve the expression calculation for performance

* Add batchnorm backward to README

* Add comments to explain inv-variance in batchnorm forward and backward

* Renaming the batchnorm forward training and inferring examples

* Add/update the comments for batchnorm-backward kernels

* Renaming again

* Add block_sync_lds between two consecutive blockwise reductions

* Move common expression 1/N out of the static_for loops

* Add dy_elementwise_op

* Renaming in backward example again

* Add checking for reduceDims in reference_batchnorm_backward

* Update to comments and codes format

* Rename in the comments

* Remove common expression out of the loop in reference_batchnorm_backward_nhwc_c

* Add block_sync_lds() between blockwise reduction again

* Fix comments again

* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test

44789d99

28 Nov, 2022 1 commit
- Remove int8 from batchnorm-forward instances since it is not needed for... · 5bf0475a
  Qianfeng authored Nov 29, 2022
```
Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test (#516)
```
  5bf0475a
25 Nov, 2022 4 commits

Merge branch 'batchnorm_fwd_fix' into feature/restruct-ckprofiler · 4be21df8
Po-Yen, Chen authored Nov 26, 2022

4be21df8
Merge branch 'develop' into feature/restruct-ckprofiler · 02ff2522
Po-Yen, Chen authored Nov 25, 2022

02ff2522
Remove int8 from batchnorm-forward instances since it is not needed for... · 3ebb4208
Qianfeng Zhang authored Nov 25, 2022
```
Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
```
3ebb4208

BatchNorm forward instance/external api/profiler/tests/client example (#511) · 4e6a5575

Qianfeng authored Nov 25, 2022



* Update to device_batchnorm_forward base class to include all template parameters for problem description

* Add batchnorm forward instances and external api

* Add batchnorm forward profiler module which uses the external api

* Add some comments in batchnorm_forward example to explain the dimensions in lengths[]

* Replace the reference_batchnorm_forward_nhwc_c by generic reference_batchnorm_forward

* Improvement to the batchnorm infer base API

* Add batchnorm forward client example which shows using the batchnorm forward external API

* Add test for batchnorm forward

* Tuning the batchnorm profiler initialized values and error threshold

* Add support for bhalf_t in instances/external api/tests

* Add support for int8_t in instances/external api/tests

* Add support for double in instances/external api/tests

* Let ScaleDataType and BiasDataType be same as XDataType and YDataType when creating instances

* Checking before running best instance in batchnorm_fwd_nhwc client example

* Add checking for YElementwiseOp in batchnorm_forward external API

* Add more types in batchnorm forward profiler

* Add more test lengths
Co-authored-by: rocking5566 <ChunYu.Lai@amd.com>

4e6a5575

21 Nov, 2022 2 commits
- Merge branch 'feature/restruct-ckprofiler' of... · acc47d12
  Po-Yen, Chen authored Nov 21, 2022
```
Merge branch 'feature/restruct-ckprofiler' of github.com:ROCmSoftwarePlatform/composable_kernel into feature/restruct-ckprofiler
```
  acc47d12
- Fix wrong include directives · 01a1037c
  Po-Yen, Chen authored Nov 21, 2022
  
  01a1037c
20 Nov, 2022 2 commits

Merge branch 'develop' into feature/restruct-ckprofiler · b0a39fec
Po Yen Chen authored Nov 20, 2022

b0a39fec

Client examples AddFastGelu and FastGelu + instances. (#509) · 43a889b7

Adam Osewski authored Nov 20, 2022



* FastGelu support for more data types.

* AddFastGelu & FastGelu instances.

* Client example.

* clang-format

* Remove unused stride variable.

* Add new line at EOF.
Co-authored-by: Adam Osewski <aosewski@amd.com>

43a889b7

18 Nov, 2022 6 commits
- Add missing include directive <iostream> · d7f0f462
  Po-Yen, Chen authored Nov 19, 2022
  
  d7f0f462
- Make friend function hidden · 27859437
  Po-Yen, Chen authored Nov 19, 2022
  
  27859437
- Use macro to eliminate redundant code · 27876602
  Po-Yen, Chen authored Nov 18, 2022
  
  27876602
- Prohibit users from calling dtor · f744c531
  Po-Yen, Chen authored Nov 18, 2022
  
  f744c531
- Use std::move() to avoid object copying · 2691d10c
  Po-Yen, Chen authored Nov 18, 2022
  
  2691d10c
- Use macro to delay expansion · 517ff41d
  Po-Yen, Chen authored Nov 18, 2022
  
  517ff41d
17 Nov, 2022 7 commits
- Use longer name to avoid name collision · 1e7557a6
  Po-Yen, Chen authored Nov 18, 2022
  
  1e7557a6
- Merge branch 'develop' into feature/restruct-ckprofiler · 7fabdef2
  Po Yen Chen authored Nov 18, 2022
  
  7fabdef2
- Add description for profiler operations · d5e056c7
  Po-Yen, Chen authored Nov 18, 2022
  
  d5e056c7
- Work around develop validation failure (#513) · 892a8d76
  Anthony Chang authored Nov 18, 2022
```
* workaround bf16 atten fwd issue on gfx908

* typo
```
  892a8d76
- Modularize ckProfiler operations · 8116d2b3
  Po-Yen, Chen authored Nov 17, 2022
  
  8116d2b3
- Rename profiler.cpp to main.cpp · df021e9d
  Po-Yen, Chen authored Nov 17, 2022
  
  df021e9d
- Re-structure ckProfiler source files · 76ff19e2
  Po-Yen, Chen authored Nov 17, 2022
  
  76ff19e2
15 Nov, 2022 4 commits

Add BF16 tests for batched_gemm_softmax_gemm_permute (#504) · 4c4c7328

guangzlu authored Nov 16, 2022



* fixed bug in softmax reference & add bf16 examples for batched_gemm_scale_softmax_gemm

* added bf16 tests for batched_gemm_softmax_gemm_permute

* changed format of device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp

* changed format device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp

* aligned annotations

* modified CMakeLists for examples

* add common example code of fp16/bf16 version for batched_gemm_scale_softmax_gemm_xdl

* use macro to control the instances

* added macro control into instances

* clang-format some files

* changed error tolerance for bf16

* changed index for 10_elementwise_normalization

* fixed xdlops code bug in amd_xdlops.hpp
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

4c4c7328

Add Conv Backward Data on Navi21 for ResNet50 (#499) · db0eb1ea

ltqin authored Nov 16, 2022



* start add example

* add device dl

* change launch kernel

* change init data method

* change example config

* add config valid check

* add instance for dl bwd

* add instance to ckProfiler

* reserver to profiler and cmakelist

* add instance to ckProfiler2

* change instance f32 config

* fix example return value
Co-authored-by: letaoqin <letaoqin@amd.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

db0eb1ea

Avoid reporting unused member function error (#507) · 7038723a
Po Yen Chen authored Nov 15, 2022

7038723a

Introduce ck::accumulate_n() (#439) · 730204ee

Po Yen Chen authored Nov 15, 2022

We can use this template to eliminate duplicated iterator computing
logics. By providing return type to ck::accumulate_n(), we can avoid
type conversion operations.

730204ee

14 Nov, 2022 1 commit

Rangify STL algorithms (#438) · dc663fae

Po Yen Chen authored Nov 15, 2022

* Rangify STL algorithms

This commit adapts rangified std::copy(), std::fill() & std::transform()

* Re-write more std::copy() calls

* Re-write std::copy() calls in profiler

dc663fae

11 Nov, 2022 3 commits

Rangify check_err() (#444) · b79bbbc2

Po Yen Chen authored Nov 12, 2022

* Rangify check_err()

By rangifying check_err(), we can not only compare values between
std::vector<>s, but also compare any ranges which have same value
type.

* Re-format example code

b79bbbc2

Fix build errors on CI server (#506) · 4382b414
Po Yen Chen authored Nov 12, 2022
```
* Add missing ignore expression

* Add missing include directive
```
4382b414

Rangify constructor of HostTensorDescriptor & Tensor<> (#445) · 4a2a56c2

Po Yen Chen authored Nov 12, 2022

* Rangify STL algorithms

This commit adapts rangified std::copy(), std::fill() & std::transform()

* Rangify check_err()

By rangifying check_err(), we can not only compare values between
std::vector<>s, but also compare any ranges which have same value
type.

* Allow constructing Tensor<> like a HostTensorDescriptor

* Simplify Tensor<> object construction logics

* Remove more unnecessary 'HostTensorDescriptor' objects

* Re-format example code

* Re-write more HostTensorDescriptor ctor call

4a2a56c2

10 Nov, 2022 3 commits

Add packages for examples and profiler (#502) · 37f2e918
Lauren Wrubleski authored Nov 10, 2022
```
* Add packages for example and profiler

* correct TEST_NAME -> EXAMPLE_NAME
```
37f2e918
Rangify FillUniformDistributionIntegerValue<> (#443) · 6f0564f0
Po Yen Chen authored Nov 11, 2022
```
Allow passing forward range to its call operator
```
6f0564f0

add client example for elementwise_normalization (#501) · 70456328

guangzlu authored Nov 11, 2022

* add client example for elementwise_normalization

* clang format elementwise_layernorm2d.cpp

* changed some naming to make it more understandable

* changed naming of input into ab_input

* fixed bug for threadwise_x_store

* add elementwise operation to reference

70456328