Commits · 2f88070afc0841e2506cae37fe03c9fe00aa7155 · gaoqiong / composable_kernel

28 Apr, 2023 1 commit

Syncing up from internal repo to enable MI300. (#690) · 4feebedd

Illia Silin authored Apr 28, 2023



* enable gfx940

* switch between intrinsic mfma routines on mi100/200 and mi300

* fix mfma_int8 on MI300

* disable 2 int8 examples on MI300

* Update cmake-ck-dev.sh

* restore gitignore file

* modify Jenkinsfile to the internal repo

---------
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

4feebedd

27 Apr, 2023 2 commits
- cancel change of format script · 716860e3
  aska-0096 authored Apr 27, 2023
  
  716860e3
- Clang format, Add gfx1101, gfx1102 support of FMHA example · d676da85
  aska-0096 authored Apr 27, 2023
  
  d676da85
19 Apr, 2023 1 commit

Merge origin dev (#2) · cad3212d

Haocong WANG authored Apr 19, 2023



* [Navi3x] Fix Gridwise_multiple_d operation (#649)

* Add CMake Option "USE_OPT_NAVI3X"

* fix bug

* standardize docs (#655)

* Separate bibtex requirement from rocm-docs-core (#656)

* separate bibtex requirement from rocm-docs-core

* point requirements to source rocm-docs-core repo

* Add CMake Option "USE_OPT_NAVI3X" (#647)

* Add CMake Option "USE_OPT_NAVI3X"

* remove navi3x opt compile option from cmake script

* Conv + quantization + tanh  (#645)

* Rename file. Prepare to support another activation

* Add comment for quantization

* Extract out_elementop

* Add tanh example

* Add conv + bias + tanh quantization instance

* Add missing parameter

* Refine cmake

* Add external api and client example

* Extract variable in example

* Fix the comment

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

* Add a denorm test fix (#603)

* Add type_convert implementations for bf16

* Add the fix for conv_fwd

* Add the fix for conv_bwd_data

* Add the fix for conv_bwd_weight

* Format

* Format

* Another format

* Add a macro to use workaround on MI200 only

* Format

---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

* simplify karg in device/grid of split-k op (#644)

* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* use name from tensor layout

* fix 3rd dword of buffer source descriptor (#659)

* add fp64 instances (#658)
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

* Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665)

This reverts commit bb5530af

.

* Groupnorm + swish external api (#668)

* Rename to proper naming

* Add example of groupnorm + swish

* Extract duplicate code in example

* Add groupnorm + swish instances

* Ractor instance generation, split into multiple cpp file

* Add external api and client example

* Refine profiler message

* Use ck math version of exp

* Refine problem size in example

* Add host version of exp

* add a marco to turn on/off denorm fix (off by default) (#673)

* add a marco to turn off denorm fix by default

* expose the marco

---------
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

* fixed quant example (#672)
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

* Add dependabot config and pin rocm-docs-core (#663)

* [gtest] suppress unsafe buffer warn (#670)

ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912



* Add memory index guard in wmma device ops (#667)

* Add more macros to turn on/off denorm fix (#678)
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>

* Fix a typo (#676)

* Add (#677)

* Allow using ROCm release candidate compilers. (#679)

* enable use of rocm5.5 release candidate 4

* upgrade to ROCM5.5 RC5

* try fix the PUB_KEY error, remove the cmake-data package

* upgrade to latest cmake version

* use private dockerhub repo for rocm5.5 rc5

* add missing bracket

* add vector load check

* solve conflicts

---------
Co-authored-by: Sam Wu <sjwu@ualberta.ca>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
Co-authored-by: rocking5566 <ChunYu.Lai@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>
Co-authored-by: Jun Liu <Liu.Jun@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

cad3212d

29 Mar, 2023 1 commit
- Add CMake Option "USE_OPT_NAVI3X" (#647) · 4e097ad2
  Haocong WANG authored Mar 30, 2023
```
* Add CMake Option "USE_OPT_NAVI3X"

* remove navi3x opt compile option from cmake script
```
  4e097ad2
15 Mar, 2023 1 commit
- Update cmake-ck-dev.sh script (#641) · fa998675
  Rostyslav Geyyer authored Mar 15, 2023
```
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
```
  fa998675
06 Mar, 2023 1 commit
- format · 6e28a8ac
  aska-0096 authored Mar 06, 2023
  
  6e28a8ac
24 Feb, 2023 1 commit
- disable tensor contraction f64 on MI100 (#602) · 209baee2
  zjing14 authored Feb 23, 2023
  
  209baee2
16 Feb, 2023 2 commits
- cancel unnecessary change · f45099cd
  aska-0096 authored Feb 16, 2023
  
  f45099cd
- format · 8dbb73b1
  aska-0096 authored Feb 16, 2023
  
  8dbb73b1
15 Feb, 2023 1 commit

Add contraction_fp64 example (#570) · 24c9ee1d

zjing14 authored Feb 15, 2023



* add contraction_bilinear

* add contraction_scale_xdl_fp64

* reduce tile size to avoid register spill

---------
Co-authored-by: root <root@ctr-ubbsmc16.amd.com>

24c9ee1d

06 Feb, 2023 1 commit

Fix CI issues. (#572) · f73574ff

Illia Silin authored Feb 06, 2023

* switch to recent staging compiler as default for CI

* fix the baseline query

* roll back sqlalchemy to version 1.4.46

f73574ff

02 Nov, 2022 1 commit

Conv perlayer int8 quantization (#471) · 226bc02b

rocking5566 authored Nov 03, 2022

* Add conv2d requant example

* Fix bash error

* Rename example

* 1. Rename gemm quantization
2. shares the requantization lambda function with conv

* Refine declare type

* Add conv bias relu quantization exmaple

* clang format

* Fix compile error due to merge develop

* Fix CI error

* Extract quantization post operation into another file

* Support quantization for non piecewise linear function

* Add instance for conv quantization

* Add convolution quantization factory

* Add convolution quantization client example

* Add more instances with different template parameters

* clang format

* Sync the naming with the develop

226bc02b

26 Oct, 2022 1 commit
- fix the script parsing the QA results (#495) · 0ee3aea1
  Illia Silin authored Oct 26, 2022
  
  0ee3aea1
03 Oct, 2022 1 commit

update document: Readme, contributors, citation, (#463) · 473ba5bc

Chao Liu authored Oct 03, 2022

* update cmake script

* update readme

* Update README.md

* add citation

* add images

* Update README.md

* update

* Update README.md

* Update CONTRIBUTORS.md

* Update README.md

* Update CITATION.cff

* Update README.md

* Update CITATION.cff

473ba5bc

21 Sep, 2022 1 commit

Build the CK targets only once. (#433) · 85b0920d

Illia Silin authored Sep 21, 2022

* build CK only once, use deb package in all subsequent stages

* update jenkins file

* change prefix for build_CK stage

* update writing deb metadata to control file

* update ubuntu source for docker, script syntax for deb package metadata

* try different way to create deb metadata

* clean up DEBIAN before creating one

* fix the CI folder names, fix splitK qa

* use correct docker in all stages, separate tests for splitK verification and performance

* clean old comments, change dir before packaging

* use different package syntax

* change packaging syntax

* package with cmake

* remove unnecessary build prefix

* get rid of unnecessary paths

* change paths during unpacking

* change script syntax while unpacking

* get rid of unneccesary steps

* get rid of comments in the scripts

* use double quotes for scripts

* add ccache during build, try dpkg -x

* pull and install each package separately

* use full package names

* try to use stashing for packages

* change stash/unstash syntax

* move unstash out of shell, run tests on any gpu node

* unpack each package separately

* try re-using existing workspace

* merge the build and test stages, only stash ckProfiler

* merge the build and test stages, only stash zipped ckProfiler

* fix syntax

* add GPU check before build and test, rename docker to usual name

85b0920d

13 Sep, 2022 1 commit

Upgrade the OS and ROCM versions. (#411) · b22ebd44

Illia Silin authored Sep 13, 2022

* upgrade the OS and ROCM versions in CK docker

* add cxx flags to link code with rocm5.2 and ck-9110 compiler

* rename the docker image

* run ONNX gemms using init=1

b22ebd44

07 Sep, 2022 1 commit

Add stderr to QA logfiles, process splitK and ONNX gemm kernels (#402) · ce74cea4

Illia Silin authored Sep 07, 2022

* add processing for the onng_gemm and splitK_gemm

* add profile_onnx_gemm.sh

* add stderr to logfiles, add splitK and onnx gemm parsing

* enable splitK gemm wresults posting to db

ce74cea4

26 Aug, 2022 1 commit
- Fixed splitk gemm fp32 (#384) · 9881625b
  zjing14 authored Aug 26, 2022
```
* add scripts

* fixed splitK_gemm_fp32

* clean

* clean
```
  9881625b
25 Aug, 2022 2 commits

GEMM batched/splitK/cgemm/grouped int4 examples (#383) · 3ab20fd7

Adam Osewski authored Aug 26, 2022



* Grouped GEmm int4.

* Formatting + fix K dimension for int8.

* Batched Gemm int4 example.

* CGEMM int4 example.

* Include inc filese in clang-format.

* SplitK int4 example

* Refactoring of performance measurement.

* Fix #ifdef statements.
Co-authored-by: Adam Osewski <aosewski@amd.com>

3ab20fd7

add scripts (#382) · f246fd2c
zjing14 authored Aug 25, 2022

f246fd2c

08 Aug, 2022 1 commit

Fix QA, allow switching compiler versions, fix google test compilation error. (#348) · aba7fefc

Illia Silin authored Aug 08, 2022

* allow selecting compiler version

* fix typo

* add Wno-deprecated flag for google tests

* change git repo, fix qa log files names

* change the git clone syntax

* use Omkar's git credentials

* try to use jenkins as git user

* try using illsilin username for gerrit repo with ssh key

* try new gerrit authorization

* change ssh key syntax

* try another way of passing ssh key to docker

* add mount ssh in dockerfile

* create .ssh folder

* move ssh-keyscan to later

* get rid of npm call

* build first docker image on master

* check the contents of the .ssh folder

* try replacing omkars creds with gerrit creds

* use open repo, clean up changes

* get rid of ssh default argument

aba7fefc

02 Aug, 2022 1 commit

Run CI on MI100 nodes only, run daily QA on MI200 nodes. (#339) · 984b3722

Illia Silin authored Aug 02, 2022



* turn on full qa only on gfx90a, use int initialization

* change script syntax

* update script parsing clinfo, throw exception if 0 devices

* fix syntax

* try using toBoolean for the QA conditions

* run regular CI on MI100 only, use MI200 only for daily QA

* evaluate when conditions before agent

* launch QA on develop branch and update profile_reduce script

* update test script

* update script

* remove false dependency from dockerfile

* try removing rbuild completely
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

984b3722

21 Jul, 2022 1 commit

Add full QA with verification option, few other changes. (#331) · d8415a96

Illia Silin authored Jul 21, 2022

* add verify flag and update scripts

* replace old check_error function with the new check_err

* fix syntax

* remove blank spaces

* remove empty line

* add check_err for tensors

* fix syntax

* replace tensors with vectors in check_err calls

* fix syntax

* remove blank spaces

* fix syntax

* add new line at end of file

* disable conv2d_bwd_weight test, add gpu check

* set check_gpu using export

* check GPU using runShell

* add definition of runShell

* fix script syntax

* reduce the number of threads, add full qa option

* run processing scripts in bash

* fix the branch and host names in performance scripts, add chronos

* replace parameterizedCron with cron

* archive the perf log files

* try to fix git call

* pass branch and host names as arguments into scripts

* fix script arguments

* fix script arguments

* process results on master

* fix pipeline

* add definition of gpu_arch

* run processing scripts in docker

* fix the brackets

* add agent master for the processing stage

* get rid of show_node_info call on master

* try using mici label instead of master, disable MI100 tests for now

* fix syntax

* simplify container for results processing

* remove node(master) from the process_results stage

* put all stages in original order

* change the agent label from master to mici for gfx908

d8415a96

13 Jul, 2022 1 commit

Add switch between compilers, make 9110 compiler default, add full QA scripts. (#322) · 39acaea3

Illia Silin authored Jul 13, 2022

* adding scripts for full perf test suite

* uncomment the sql queries

* fix typo and chmod a+x for scripts

* dos2unix for all new scripts

* disable verification in full performance test

* fix reduction scripts, add gfrouped_gemm hotfix

* fix the grouped_gemm hotfix and only run reduction for fp16

* change compiler flag syntax

* fix syntax

* add predefinition of dockerArgs

* avoid redefinitions of dockerArgs

* add blank space at the end of dockerArgs

* try to build with release compiler

* adding spaces inside if condition

* limit the number of threads for building 9110 compiler

* change the way HIP_CLANG_PATH is set

* remove the export command

* change the conditional ENV syntax

* set HIP_CLANG_PATH at docker run time

* update scripts for full qa

* enable the sql write query

* fix typo

* remove a comment from a script

39acaea3

01 Jul, 2022 1 commit

Improve external interface for GEMM and GEMM+add+add+fastgelu (#311) · 0dcb3496

Chao Liu authored Jun 30, 2022

* interface for GEMM and GEMM+add+add+fastgelu

* rename namespace

* instance factory

* fix build

* fix build; add GEMM client example

* clean

0dcb3496

23 Jun, 2022 1 commit

Testing all fwd convolution specializations. (#259) · a2edd7d8

Adam Osewski authored Jun 23, 2022



* UniforFill with integer values.

* Log tested instance type string.

* Add UT for all convolution specializations.

* debugging conv

* Fix dangling reference bug.

* Small refinements.

* Fix call to error checking function.

* Small refinements to tests.

* Configure error tolerance
* Change problem size.
* Remove OddC case from types that do not support it.

* Add helper traits for AccumulatorDataType.

* Print first 5 errs in check_err for integral types.

* Rename FillUniform to FillUniformDistribution

* Refactor

* Do not use typed tests.
* Instead use plain fixture class with templatized member functions.
* Initialize tensors with integer values.

* Refine test instances.

* Properly set accumulator data type.
* Add another "big" instance.

* Refactor convolution tests.

* Revert "debugging conv"

This reverts commit b109516455631ff8fd6dce99cf7c14bf8e323ebb.

* Add pragma once + format + small refinement.

* Fix some unwanted changes.

* Clang-format

* Fix profile_convnd to use renamed tensor initializer.

* Add instances for ConvFWDND kernel case 2D

* Helpers to get ConvNDFwd 2D instances.

* Refactoring.

* Remove "small block" instance as it was generating compiler errors.
* Remove default template parameters values.

* Refine and fix test.

* Fix problem with default template parameter types.
* Adjust error thresholds for floating point values test.
* Use integer values initialization for instances test.
* Add tests for ConvNDFwd 2D case.

* Remove AccumulatorDataType type trait.

* Update unit-tests.

* Remove operator<< overload.

* Unlock conv1d/3d nd fwd instances.

* Enable skipping calculating reference using flag.

* Fix number of channels for first ResNet50 layer.

* Clang-format.
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

a2edd7d8

21 Jun, 2022 1 commit
- update readme and script (#290) · ccbd8d90
  Chao Liu authored Jun 20, 2022
  
  ccbd8d90
16 Jun, 2022 1 commit

Use new github credentials (#278) · fb9b6b1e

Illia Silin authored Jun 15, 2022

* use pre-built docker instead of building a new one

* try docker.image.pull

* change syntax in docker.image()

* add 30 min timeout

* increase timeout to 3 hours

* move performance tests to first stage for testing

* set image variable to the new container name

* update image name

* check available images

* check available images in both places

* try different image name

* use image ID to refer to image

* run performance on gfx90a

* fix the gpu_arch labeling, add parameter

* move env vars out of stages

* add stand-alone performance script, MI200 tests, CU numbers

* dos2unix for run_perf_tests.sh

* try the new git credentials

* use env var for git credentials

fb9b6b1e

10 Jun, 2022 1 commit

Add performance tests on MI200 in CI, reporting number of CUs, add stand-alone perf test. (#277) · 1ced00a5

Illia Silin authored Jun 10, 2022

* use pre-built docker instead of building a new one

* try docker.image.pull

* change syntax in docker.image()

* add 30 min timeout

* increase timeout to 3 hours

* move performance tests to first stage for testing

* set image variable to the new container name

* update image name

* check available images

* check available images in both places

* try different image name

* use image ID to refer to image

* run performance on gfx90a

* fix the gpu_arch labeling, add parameter

* move env vars out of stages

* add stand-alone performance script, MI200 tests, CU numbers

1ced00a5

02 Jun, 2022 1 commit

Adding Resnet50 test to Performance tests (#268) · 1677cf70

Illia Silin authored Jun 02, 2022

* add resnet50 test to performance tests

* add blanks before gpu_arch in log files

* add resnet50 test with N=4 and process its results

* add ROCM and HIP versions to test tables

* uncomment the sql queries

* fix script syntax in jenkinsfile

1677cf70

24 May, 2022 2 commits

Overhaul to Reducton and its dependants (#237) · 63eee2d9

Qianfeng authored May 25, 2022

* Tiny fix in dynamic_buffer.hpp to support vectorized AtomicAdd for double type

* Update to host layer and host reduction

* Merge and remove reduction kernels

* Merge and remove reduction device interfaces and update pooling device interface

* Merge and remove useless reduction device instances

* Update to reduction profiler and reduction ctests

* Update to reduction and pooling examples and add one reduction example

* Change to reduction examples to let them testable by ctest

* Add explicit pass checking for reduction and pooling examples

* Explicit assignment of tensor shapes in example reduce_blockwise_two_call

* Use atomic_add to repace atomicAdd and add atomic_add for double type

* Add reduce ctest support for double data type

* Replace to_int_vector() by using c++ std::vector::assign()

* Keep DeviceReduceThreadWise separated from DeviceReduceBlockWise

* Merge DeviceReduceBlockWise and DeviceReduceMultiBlockAtomicAdd into DeviceReduceMultiBlock

* Add GetAtomicOperationZeroValue() support for AtomicMax

* Tiny change to reduce example README.md

* Fix some tiny issues due to branch merging

* Revoke previous change in dynamic_buffer.hpp and add atomic_add for double2_t

* Add reduce multiblock_atomic_add instances for fp64 to verify vectorized atomic_add on fp64

* Renaming

* Clean the header includings in device_reduce instances header files

63eee2d9

Add performance tests as a stage of CI. (#247) · 1085794d

Illia Silin authored May 24, 2022

* modify ckProfiler_gemm output

* fix syntax

* change ckProfiler output and return 0

* fix syntax

* output datatype

* fix syntax

* output datatype in another way

* fix syntax

* fix syntax

* test return values of ckProfiler

* add layout info and tests, make sure ckprofiler returns 0

* fix syntax

* change layout output

* fix syntax

* fix syntax again

* update script to process perf results

* rearrange jenkins stages

* fix typo

* add python packages to Docker file

* adding setuptools-rust package

* modify parsing for new test parameters

* test db credentials on jenkins

* fix syntax

* update python script to handle incomplete lines

* ungrade python to 3.8 and write the gemm_params table

* add sqlalchemy package to docker

* move perf data processing to master node

* move the master node inside a steps region

* add new stage for result processing

* move results processing to separate stage

* reduce number of tests to speedup debugging

* pass config to processPerfResults stage

* run script on master in a docker container

* replace show_node_info

* try loading docker on master node again

* use ansible node instead of master

* get rid of pymysql package

* try ssh connection using paramiko

* put back pymysql

* put the perf data processing back on the gpu node

* put back artifact definition

* archive the perf_log before parsing

* clean up jenkinsfile, fix parsing

* fix typo

* enable all perf tests

* put all stages in original order, finalize script

* fix gpu_arch version

* update parsing script

* remove obsolete file causing merge conflict

1085794d

08 May, 2022 1 commit

Add Benchmark test into CI (#226) · a3c910ac

Illia Silin authored May 08, 2022



* add performance test to jenkins pipeline

* fix typo

* fix the syntax in conv_fwd_util.cpp

* fix the error message syntax spacing

* fix the error message syntax spacing again

* run profile_gemm and archive results

* fix typo

* try to figure out the paths

* try to figure out the paths one more time

* skip the copying step

* build ckProfiler release only once

* change directory using dir

* fix dir syntax

* change the gemm parameters

* do not pipe script output to file

* try running ckProfiler directly

* fix typo

* use set +e

* run profile_gemm.sh || true

* run multiple gemms and parse results

* fix typo in jenkinsfile

* fix syntax

* add new gemm sizes, update scripts

* put all jenkins steps in original order
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

a3c910ac

22 Apr, 2022 1 commit
- Clang-format only modified files. (#181) · 31d869ad
  Adam Osewski authored Apr 22, 2022
  
  31d869ad
15 Apr, 2022 1 commit

Compile CK for all targets (#188) · 4221505d

Illia Silin authored Apr 15, 2022



* compile ck for all targets

* update the target criteria

* change the target condition

* fixed some typos

* fixed missed file

* revert changes in README

* revert device_conv3d_fwd_xdl_...

* update device_conv3d_fwd_xdl_...

* update device_batched_gemm_reduce...

* test the unused arguments fix

* test the warning suppression

* try suppress warnings in device_batched_gemm_reduce_xdl...

* fix the last warnings

* replace UNUSED with std::ignore

* fix a typo

* replaced std::ignore with ignore

* add igonre header to common_header

* refactor atomicAdd
Co-authored-by: Chao Liu <chao.liu2@amd.com>

4221505d

31 Mar, 2022 1 commit

Compile for gfx908 and gfx90a (#130) · cd167e49

Chao Liu authored Mar 31, 2022

* adding compilation for multiple targets

* fix build

* clean

* update Jekinsfile

* update readme

* update Jenkins

* use ck::half_t instead of ushort for bf16

* rename enum classes

* clean

* rename

* clean

cd167e49

23 Mar, 2022 1 commit

Unified conv3D API + support for all data types. (#133) · f91579aa

Adam Osewski authored Mar 23, 2022



* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example to prevent conflicts.

* Split convnd instance into separate files for 1d/2d

* Address review comments.

* Fix data type for flops/gbytes calculations.

* Assign example number 11.

* 3D cases for convolution utility functions.

* 3D reference convolution.

* Add support for 3D convolution.

* Check for inputs bigger than  2GB.

* Formatting

* Support for bf16/f16/f32/i8 - conv instances + UT.

* Use check_err from test_util.hpp.

* Split convnd test into separate files for each dim.

* Fix data generation and use proper instances.

* Formatting

* Skip tensor initialization if not necessary.

* Fix CMakefiles.

* Remove redundant conv2d_fwd test.

* Lower problem size for conv3D UT.

* 3D case for convnd example.

* Remove leftovers after merge.

* Add Conv Specialization string to GetTypeString

* Skip instance causing numerical errors.

* Small fixes.

* Remove redundant includes.

* Fix namespace name error.

* Script for automatic testing and logging convolution fwd UTs

* Comment out numactl cmd.

* Refine weights initalization and relax rtol for fp16

* Fix weights initialization for int8.

* Add type_convert when store output in ref conv 1D.

* Get back old conv2d_fwd_xdl operation.

* Silence conv debug print.

* format

* clean

* clean

* Fix merge.

* Fix namespace for check_err
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

f91579aa

22 Mar, 2022 1 commit

Reduction for int8 and bfloat16 (#125) · 9a8ee8a3

Qianfeng authored Mar 23, 2022



* Use thread cluster descriptor and explicit M_K 2d descriptor to simply Blockwise Reduction

* Change by replacing ReduceDims by NumReduceDims as Device Reduce interface template parameter

* Rename the folder name for the pool2d and reduce examples

* Update to reduction test scripts

* Add Readme for pool2d_fwd and reduce_blockwise examples

* Add support for int8_t reduction (ADD/AVG, MIN/MAX/AMAX)

* Tiny fix in reduce profiler and tiny update in reduce testing scripts

* Tiny fix in testing script profile_reduce_no_index.sh

* Tiny fix in testing script profile_reduce_no_index.sh

* Add support for bfp16 reduction (using bhalf_t = ushort)

* Tiny fix in amd_buffer_addressing.hpp

* Tiny change in script/profile_reduce_with_index.sh

* Use AccDataType for Beta value and use element_wise::PassThrough

* Use type_convert for type converting in host layer reduction

* Renaming and refining in Reduction profiler/device layer/examples

* Renaming and refining in Reduction profiler/device layer/examples

* Renaming all NumReduceDims to NumReduceDim

* Fix the leaked type_convert in ThreadwiseTensorSliceTransfer_v2

* Update to testing scripts to add bf16 support

* added more static_assert

* Remove buggy tunable configurations defined in device_reduce_instance_xxx.hpp

* Add static_assert to give compile-time warning for incorrect thread slice-size/vector-size configurations

* minor change

* Refine and fix (in GetWorkspaceSizeInBytes of MultiBlockPartialReduce) to make int8 completely pass

* Tiny renaming in gridwise_2d_reduction_multiblock_partial_reduce.hpp

* Tiny fix in script/profile_reduce_no_index.sh

* Refine in DeviceReduce layer with regard to using NumInvariantDim/NumReduceDim or InvariantDims/ReduceDims

* Generic renaming in host reduction and DeviceReduce layer

* Add support for 4-d all dimension reduction in the profiler and add_device_reduce_xxx instances

* Use multi-thread and simplification for host Reduction implementation

* Add ctest for reduction

* Update to clarify the using of data init method in produce_reduce/example_reduce/test_reduce/

* Update to the reduce CTest executables to enable default testing behavior when no command argument

* Renaming
Co-authored-by: Jianfeng yan <jfyan008@gmail.com>

9a8ee8a3

11 Mar, 2022 1 commit
- revise count_vgpr script to capture all possible syntaxes (#124) · c78d1be1
  Anthony Chang authored Mar 12, 2022
  
  c78d1be1