Commits · 9a8ee8a39a0aa6059c55faba05f6abb904fff6dd · yangql / composable_kernel-1

"git@developer.sourcefind.cn:OpenDAS/megatron-lm.git" did not exist on "6e9d5cb0512951d7ef01788f55b84dd20fb70963"

22 Mar, 2022 1 commit

Reduction for int8 and bfloat16 (#125) · 9a8ee8a3

Qianfeng authored Mar 23, 2022



* Use thread cluster descriptor and explicit M_K 2d descriptor to simply Blockwise Reduction

* Change by replacing ReduceDims by NumReduceDims as Device Reduce interface template parameter

* Rename the folder name for the pool2d and reduce examples

* Update to reduction test scripts

* Add Readme for pool2d_fwd and reduce_blockwise examples

* Add support for int8_t reduction (ADD/AVG, MIN/MAX/AMAX)

* Tiny fix in reduce profiler and tiny update in reduce testing scripts

* Tiny fix in testing script profile_reduce_no_index.sh

* Tiny fix in testing script profile_reduce_no_index.sh

* Add support for bfp16 reduction (using bhalf_t = ushort)

* Tiny fix in amd_buffer_addressing.hpp

* Tiny change in script/profile_reduce_with_index.sh

* Use AccDataType for Beta value and use element_wise::PassThrough

* Use type_convert for type converting in host layer reduction

* Renaming and refining in Reduction profiler/device layer/examples

* Renaming and refining in Reduction profiler/device layer/examples

* Renaming all NumReduceDims to NumReduceDim

* Fix the leaked type_convert in ThreadwiseTensorSliceTransfer_v2

* Update to testing scripts to add bf16 support

* added more static_assert

* Remove buggy tunable configurations defined in device_reduce_instance_xxx.hpp

* Add static_assert to give compile-time warning for incorrect thread slice-size/vector-size configurations

* minor change

* Refine and fix (in GetWorkspaceSizeInBytes of MultiBlockPartialReduce) to make int8 completely pass

* Tiny renaming in gridwise_2d_reduction_multiblock_partial_reduce.hpp

* Tiny fix in script/profile_reduce_no_index.sh

* Refine in DeviceReduce layer with regard to using NumInvariantDim/NumReduceDim or InvariantDims/ReduceDims

* Generic renaming in host reduction and DeviceReduce layer

* Add support for 4-d all dimension reduction in the profiler and add_device_reduce_xxx instances

* Use multi-thread and simplification for host Reduction implementation

* Add ctest for reduction

* Update to clarify the using of data init method in produce_reduce/example_reduce/test_reduce/

* Update to the reduce CTest executables to enable default testing behavior when no command argument

* Renaming
Co-authored-by: Jianfeng yan <jfyan008@gmail.com>

9a8ee8a3

21 Mar, 2022 2 commits
- refactored deviceBatchedGemm; removed GridwiseBatchedGemm; added fp32 and int8 to profiler (#120) · cb87b049
  Jianfeng Yan authored Mar 21, 2022
```
changed long_index_t to index_t when computing memory offset

uncomment other ops in profiler

added test for batched_gemm
```
  cb87b049
- Fix conv2d bwd data bug when filter is 1x1 and stride = 2 (#132) · b51808d7
  ltqin authored Mar 21, 2022
```
* fix bwd data filter1strid2 bug

* fichangeshort to ck::bhalf_t

* reset input to zero
Co-authored-by: ltqin <letaoqin@amd.com>
```
  b51808d7
11 Mar, 2022 1 commit

Use Space Filling Curve in Threadwise Copy (#118) · 9e33fe70

Jianfeng Yan authored Mar 11, 2022



* fixed a corner case in GetCoordinateResetStep

* clean

* rename num_accesses to num_access
Co-authored-by: Chao Liu <chao.liu2@amd.com>

9e33fe70

10 Mar, 2022 1 commit

Pr82 followup (#115) · 827301d9

Qianfeng authored Mar 11, 2022

* Use thread cluster descriptor and explicit M_K 2d descriptor to simply Blockwise Reduction

* Change by replacing ReduceDims by NumReduceDims as Device Reduce interface template parameter

* Rename the folder name for the pool2d and reduce examples

* Update to reduction test scripts

* Add Readme for pool2d_fwd and reduce_blockwise examples

* Tiny fix in reduce profiler and tiny update in reduce testing scripts

* Tiny fix in testing script profile_reduce_no_index.sh

* Tiny change in script/profile_reduce_with_index.sh

* Renaming and refining in Reduction profiler/device layer/examples

* Renaming and refining in Reduction profiler/device layer/examples

* Renaming all NumReduceDims to NumReduceDim

827301d9

09 Mar, 2022 1 commit

Reorganize files, Part 1 (#119) · 5d37d7bf

Chao Liu authored Mar 08, 2022

* delete obselete files

* move files

* build

* update cmake

* update cmake

* fix build

* reorg examples

* update cmake for example and test

5d37d7bf

13 Jun, 2019 1 commit
- reorginzed files · 1566b317
  Chao Liu authored Jun 13, 2019
  
  1566b317
12 Jun, 2019 3 commits
- change build · c82b833d
  Chao Liu authored Jun 12, 2019
  
  c82b833d
- fixed build issue · f2b92ba9
  Chao Liu authored Jun 12, 2019
  
  f2b92ba9
- reorginze files · 81497a93
  Chao Liu authored Jun 11, 2019
  
  81497a93