1. 28 Apr, 2023 1 commit
  2. 27 Apr, 2023 2 commits
  3. 19 Apr, 2023 1 commit
    • Haocong WANG's avatar
      Merge origin dev (#2) · cad3212d
      Haocong WANG authored
      
      
      * [Navi3x] Fix Gridwise_multiple_d operation (#649)
      
      * Add CMake Option "USE_OPT_NAVI3X"
      
      * fix bug
      
      * standardize docs (#655)
      
      * Separate bibtex requirement from rocm-docs-core (#656)
      
      * separate bibtex requirement from rocm-docs-core
      
      * point requirements to source rocm-docs-core repo
      
      * Add CMake Option "USE_OPT_NAVI3X" (#647)
      
      * Add CMake Option "USE_OPT_NAVI3X"
      
      * remove navi3x opt compile option from cmake script
      
      * Conv + quantization + tanh  (#645)
      
      * Rename file. Prepare to support another activation
      
      * Add comment for quantization
      
      * Extract out_elementop
      
      * Add tanh example
      
      * Add conv + bias + tanh quantization instance
      
      * Add missing parameter
      
      * Refine cmake
      
      * Add external api and client example
      
      * Extract variable in example
      
      * Fix the comment
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      
      * Add a denorm test fix (#603)
      
      * Add type_convert implementations for bf16
      
      * Add the fix for conv_fwd
      
      * Add the fix for conv_bwd_data
      
      * Add the fix for conv_bwd_weight
      
      * Format
      
      * Format
      
      * Another format
      
      * Add a macro to use workaround on MI200 only
      
      * Format
      
      ---------
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      
      * simplify karg in device/grid of split-k op (#644)
      
      * simplify karg in device/grid split-k op
      
      * fix mk_kn_mn instances
      
      * add more instances
      
      * use name from tensor layout
      
      * fix 3rd dword of buffer source descriptor (#659)
      
      * add fp64 instances (#658)
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665)
      
      This reverts commit bb5530af
      
      .
      
      * Groupnorm + swish external api (#668)
      
      * Rename to proper naming
      
      * Add example of groupnorm + swish
      
      * Extract duplicate code in example
      
      * Add groupnorm + swish instances
      
      * Ractor instance generation, split into multiple cpp file
      
      * Add external api and client example
      
      * Refine profiler message
      
      * Use ck math version of exp
      
      * Refine problem size in example
      
      * Add host version of exp
      
      * add a marco to turn on/off denorm fix (off by default) (#673)
      
      * add a marco to turn off denorm fix by default
      
      * expose the marco
      
      ---------
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * fixed quant example (#672)
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      
      * Add dependabot config and pin rocm-docs-core (#663)
      
      * [gtest] suppress unsafe buffer warn (#670)
      
      ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912
      
      
      
      * Add memory index guard in wmma device ops (#667)
      
      * Add more macros to turn on/off denorm fix (#678)
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      
      * Fix a typo (#676)
      
      * Add (#677)
      
      * Allow using ROCm release candidate compilers. (#679)
      
      * enable use of rocm5.5 release candidate 4
      
      * upgrade to ROCM5.5 RC5
      
      * try fix the PUB_KEY error, remove the cmake-data package
      
      * upgrade to latest cmake version
      
      * use private dockerhub repo for rocm5.5 rc5
      
      * add missing bracket
      
      * add vector load check
      
      * solve conflicts
      
      ---------
      Co-authored-by: default avatarSam Wu <sjwu@ualberta.ca>
      Co-authored-by: default avatarSam Wu <sam.wu2@amd.com>
      Co-authored-by: default avatarrocking5566 <ChunYu.Lai@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarRostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
      Co-authored-by: default avatarJun Liu <Liu.Jun@amd.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      cad3212d
  4. 29 Mar, 2023 1 commit
  5. 15 Mar, 2023 1 commit
  6. 06 Mar, 2023 1 commit
  7. 24 Feb, 2023 1 commit
  8. 16 Feb, 2023 2 commits
  9. 15 Feb, 2023 1 commit
  10. 06 Feb, 2023 1 commit
    • Illia Silin's avatar
      Fix CI issues. (#572) · f73574ff
      Illia Silin authored
      * switch to recent staging compiler as default for CI
      
      * fix the baseline query
      
      * roll back sqlalchemy to version 1.4.46
      f73574ff
  11. 02 Nov, 2022 1 commit
    • rocking5566's avatar
      Conv perlayer int8 quantization (#471) · 226bc02b
      rocking5566 authored
      * Add conv2d requant example
      
      * Fix bash error
      
      * Rename example
      
      * 1. Rename gemm quantization
      2. shares the requantization lambda function with conv
      
      * Refine declare type
      
      * Add conv bias relu quantization exmaple
      
      * clang format
      
      * Fix compile error due to merge develop
      
      * Fix CI error
      
      * Extract quantization post operation into another file
      
      * Support quantization for non piecewise linear function
      
      * Add instance for conv quantization
      
      * Add convolution quantization factory
      
      * Add convolution quantization client example
      
      * Add more instances with different template parameters
      
      * clang format
      
      * Sync the naming with the develop
      226bc02b
  12. 26 Oct, 2022 1 commit
  13. 03 Oct, 2022 1 commit
    • Chao Liu's avatar
      update document: Readme, contributors, citation, (#463) · 473ba5bc
      Chao Liu authored
      * update cmake script
      
      * update readme
      
      * Update README.md
      
      * add citation
      
      * add images
      
      * Update README.md
      
      * update
      
      * Update README.md
      
      * Update CONTRIBUTORS.md
      
      * Update README.md
      
      * Update CITATION.cff
      
      * Update README.md
      
      * Update CITATION.cff
      473ba5bc
  14. 21 Sep, 2022 1 commit
    • Illia Silin's avatar
      Build the CK targets only once. (#433) · 85b0920d
      Illia Silin authored
      * build CK only once, use deb package in all subsequent stages
      
      * update jenkins file
      
      * change prefix for build_CK stage
      
      * update writing deb metadata to control file
      
      * update ubuntu source for docker, script syntax for deb package metadata
      
      * try different way to create deb metadata
      
      * clean up DEBIAN before creating one
      
      * fix the CI folder names, fix splitK qa
      
      * use correct docker in all stages, separate tests for splitK verification and performance
      
      * clean old comments, change dir before packaging
      
      * use different package syntax
      
      * change packaging syntax
      
      * package with cmake
      
      * remove unnecessary build prefix
      
      * get rid of unnecessary paths
      
      * change paths during unpacking
      
      * change script syntax while unpacking
      
      * get rid of unneccesary steps
      
      * get rid of comments in the scripts
      
      * use double quotes for scripts
      
      * add ccache during build, try dpkg -x
      
      * pull and install each package separately
      
      * use full package names
      
      * try to use stashing for packages
      
      * change stash/unstash syntax
      
      * move unstash out of shell, run tests on any gpu node
      
      * unpack each package separately
      
      * try re-using existing workspace
      
      * merge the build and test stages, only stash ckProfiler
      
      * merge the build and test stages, only stash zipped ckProfiler
      
      * fix syntax
      
      * add GPU check before build and test, rename docker to usual name
      85b0920d
  15. 13 Sep, 2022 1 commit
    • Illia Silin's avatar
      Upgrade the OS and ROCM versions. (#411) · b22ebd44
      Illia Silin authored
      * upgrade the OS and ROCM versions in CK docker
      
      * add cxx flags to link code with rocm5.2 and ck-9110 compiler
      
      * rename the docker image
      
      * run ONNX gemms using init=1
      b22ebd44
  16. 07 Sep, 2022 1 commit
  17. 26 Aug, 2022 1 commit
  18. 25 Aug, 2022 2 commits
  19. 08 Aug, 2022 1 commit
    • Illia Silin's avatar
      Fix QA, allow switching compiler versions, fix google test compilation error. (#348) · aba7fefc
      Illia Silin authored
      * allow selecting compiler version
      
      * fix typo
      
      * add Wno-deprecated flag for google tests
      
      * change git repo, fix qa log files names
      
      * change the git clone syntax
      
      * use Omkar's git credentials
      
      * try to use jenkins as git user
      
      * try using illsilin username for gerrit repo with ssh key
      
      * try new gerrit authorization
      
      * change ssh key syntax
      
      * try another way of passing ssh key to docker
      
      * add mount ssh in dockerfile
      
      * create .ssh folder
      
      * move ssh-keyscan to later
      
      * get rid of npm call
      
      * build first docker image on master
      
      * check the contents of the .ssh folder
      
      * try replacing omkars creds with gerrit creds
      
      * use open repo, clean up changes
      
      * get rid of ssh default argument
      aba7fefc
  20. 02 Aug, 2022 1 commit
    • Illia Silin's avatar
      Run CI on MI100 nodes only, run daily QA on MI200 nodes. (#339) · 984b3722
      Illia Silin authored
      
      
      * turn on full qa only on gfx90a, use int initialization
      
      * change script syntax
      
      * update script parsing clinfo, throw exception if 0 devices
      
      * fix syntax
      
      * try using toBoolean for the QA conditions
      
      * run regular CI on MI100 only, use MI200 only for daily QA
      
      * evaluate when conditions before agent
      
      * launch QA on develop branch and update profile_reduce script
      
      * update test script
      
      * update script
      
      * remove false dependency from dockerfile
      
      * try removing rbuild completely
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarChao Liu <lc.roy86@gmail.com>
      984b3722
  21. 21 Jul, 2022 1 commit
    • Illia Silin's avatar
      Add full QA with verification option, few other changes. (#331) · d8415a96
      Illia Silin authored
      * add verify flag and update scripts
      
      * replace old check_error function with the new check_err
      
      * fix syntax
      
      * remove blank spaces
      
      * remove empty line
      
      * add check_err for tensors
      
      * fix syntax
      
      * replace tensors with vectors in check_err calls
      
      * fix syntax
      
      * remove blank spaces
      
      * fix syntax
      
      * add new line at end of file
      
      * disable conv2d_bwd_weight test, add gpu check
      
      * set check_gpu using export
      
      * check GPU using runShell
      
      * add definition of runShell
      
      * fix script syntax
      
      * reduce the number of threads, add full qa option
      
      * run processing scripts in bash
      
      * fix the branch and host names in performance scripts, add chronos
      
      * replace parameterizedCron with cron
      
      * archive the perf log files
      
      * try to fix git call
      
      * pass branch and host names as arguments into scripts
      
      * fix script arguments
      
      * fix script arguments
      
      * process results on master
      
      * fix pipeline
      
      * add definition of gpu_arch
      
      * run processing scripts in docker
      
      * fix the brackets
      
      * add agent master for the processing stage
      
      * get rid of show_node_info call on master
      
      * try using mici label instead of master, disable MI100 tests for now
      
      * fix syntax
      
      * simplify container for results processing
      
      * remove node(master) from the process_results stage
      
      * put all stages in original order
      
      * change the agent label from master to mici for gfx908
      d8415a96
  22. 13 Jul, 2022 1 commit
    • Illia Silin's avatar
      Add switch between compilers, make 9110 compiler default, add full QA scripts. (#322) · 39acaea3
      Illia Silin authored
      * adding scripts for full perf test suite
      
      * uncomment the sql queries
      
      * fix typo and chmod a+x for scripts
      
      * dos2unix for all new scripts
      
      * disable verification in full performance test
      
      * fix reduction scripts, add gfrouped_gemm hotfix
      
      * fix the grouped_gemm hotfix and only run reduction for fp16
      
      * change compiler flag syntax
      
      * fix syntax
      
      * add predefinition of dockerArgs
      
      * avoid redefinitions of dockerArgs
      
      * add blank space at the end of dockerArgs
      
      * try to build with release compiler
      
      * adding spaces inside if condition
      
      * limit the number of threads for building 9110 compiler
      
      * change the way HIP_CLANG_PATH is set
      
      * remove the export command
      
      * change the conditional ENV syntax
      
      * set HIP_CLANG_PATH at docker run time
      
      * update scripts for full qa
      
      * enable the sql write query
      
      * fix typo
      
      * remove a comment from a script
      39acaea3
  23. 01 Jul, 2022 1 commit
  24. 23 Jun, 2022 1 commit
    • Adam Osewski's avatar
      Testing all fwd convolution specializations. (#259) · a2edd7d8
      Adam Osewski authored
      
      
      * UniforFill with integer values.
      
      * Log tested instance type string.
      
      * Add UT for all convolution specializations.
      
      * debugging conv
      
      * Fix dangling reference bug.
      
      * Small refinements.
      
      * Fix call to error checking function.
      
      * Small refinements to tests.
      
      * Configure error tolerance
      * Change problem size.
      * Remove OddC case from types that do not support it.
      
      * Add helper traits for AccumulatorDataType.
      
      * Print first 5 errs in check_err for integral types.
      
      * Rename FillUniform to FillUniformDistribution
      
      * Refactor
      
      * Do not use typed tests.
      * Instead use plain fixture class with templatized member functions.
      * Initialize tensors with integer values.
      
      * Refine test instances.
      
      * Properly set accumulator data type.
      * Add another "big" instance.
      
      * Refactor convolution tests.
      
      * Revert "debugging conv"
      
      This reverts commit b109516455631ff8fd6dce99cf7c14bf8e323ebb.
      
      * Add pragma once + format + small refinement.
      
      * Fix some unwanted changes.
      
      * Clang-format
      
      * Fix profile_convnd to use renamed tensor initializer.
      
      * Add instances for ConvFWDND kernel case 2D
      
      * Helpers to get ConvNDFwd 2D instances.
      
      * Refactoring.
      
      * Remove "small block" instance as it was generating compiler errors.
      * Remove default template parameters values.
      
      * Refine and fix test.
      
      * Fix problem with default template parameter types.
      * Adjust error thresholds for floating point values test.
      * Use integer values initialization for instances test.
      * Add tests for ConvNDFwd 2D case.
      
      * Remove AccumulatorDataType type trait.
      
      * Update unit-tests.
      
      * Remove operator<< overload.
      
      * Unlock conv1d/3d nd fwd instances.
      
      * Enable skipping calculating reference using flag.
      
      * Fix number of channels for first ResNet50 layer.
      
      * Clang-format.
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      a2edd7d8
  25. 21 Jun, 2022 1 commit
  26. 16 Jun, 2022 1 commit
    • Illia Silin's avatar
      Use new github credentials (#278) · fb9b6b1e
      Illia Silin authored
      * use pre-built docker instead of building a new one
      
      * try docker.image.pull
      
      * change syntax in docker.image()
      
      * add 30 min timeout
      
      * increase timeout to 3 hours
      
      * move performance tests to first stage for testing
      
      * set image variable to the new container name
      
      * update image name
      
      * check available images
      
      * check available images in both places
      
      * try different image name
      
      * use image ID to refer to image
      
      * run performance on gfx90a
      
      * fix the gpu_arch labeling, add parameter
      
      * move env vars out of stages
      
      * add stand-alone performance script, MI200 tests, CU numbers
      
      * dos2unix for run_perf_tests.sh
      
      * try the new git credentials
      
      * use env var for git credentials
      fb9b6b1e
  27. 10 Jun, 2022 1 commit
    • Illia Silin's avatar
      Add performance tests on MI200 in CI, reporting number of CUs, add stand-alone perf test. (#277) · 1ced00a5
      Illia Silin authored
      * use pre-built docker instead of building a new one
      
      * try docker.image.pull
      
      * change syntax in docker.image()
      
      * add 30 min timeout
      
      * increase timeout to 3 hours
      
      * move performance tests to first stage for testing
      
      * set image variable to the new container name
      
      * update image name
      
      * check available images
      
      * check available images in both places
      
      * try different image name
      
      * use image ID to refer to image
      
      * run performance on gfx90a
      
      * fix the gpu_arch labeling, add parameter
      
      * move env vars out of stages
      
      * add stand-alone performance script, MI200 tests, CU numbers
      1ced00a5
  28. 02 Jun, 2022 1 commit
    • Illia Silin's avatar
      Adding Resnet50 test to Performance tests (#268) · 1677cf70
      Illia Silin authored
      * add resnet50 test to performance tests
      
      * add blanks before gpu_arch in log files
      
      * add resnet50 test with N=4 and process its results
      
      * add ROCM and HIP versions to test tables
      
      * uncomment the sql queries
      
      * fix script syntax in jenkinsfile
      1677cf70
  29. 24 May, 2022 2 commits
    • Qianfeng's avatar
      Overhaul to Reducton and its dependants (#237) · 63eee2d9
      Qianfeng authored
      * Tiny fix in dynamic_buffer.hpp to support vectorized AtomicAdd for double type
      
      * Update to host layer and host reduction
      
      * Merge and remove reduction kernels
      
      * Merge and remove reduction device interfaces and update pooling device interface
      
      * Merge and remove useless reduction device instances
      
      * Update to reduction profiler and reduction ctests
      
      * Update to reduction and pooling examples and add one reduction example
      
      * Change to reduction examples to let them testable by ctest
      
      * Add explicit pass checking for reduction and pooling examples
      
      * Explicit assignment of tensor shapes in example reduce_blockwise_two_call
      
      * Use atomic_add to repace atomicAdd and add atomic_add for double type
      
      * Add reduce ctest support for double data type
      
      * Replace to_int_vector() by using c++ std::vector::assign()
      
      * Keep DeviceReduceThreadWise separated from DeviceReduceBlockWise
      
      * Merge DeviceReduceBlockWise and DeviceReduceMultiBlockAtomicAdd into DeviceReduceMultiBlock
      
      * Add GetAtomicOperationZeroValue() support for AtomicMax
      
      * Tiny change to reduce example README.md
      
      * Fix some tiny issues due to branch merging
      
      * Revoke previous change in dynamic_buffer.hpp and add atomic_add for double2_t
      
      * Add reduce multiblock_atomic_add instances for fp64 to verify vectorized atomic_add on fp64
      
      * Renaming
      
      * Clean the header includings in device_reduce instances header files
      63eee2d9
    • Illia Silin's avatar
      Add performance tests as a stage of CI. (#247) · 1085794d
      Illia Silin authored
      * modify ckProfiler_gemm output
      
      * fix syntax
      
      * change ckProfiler output and return 0
      
      * fix syntax
      
      * output datatype
      
      * fix syntax
      
      * output datatype in another way
      
      * fix syntax
      
      * fix syntax
      
      * test return values of ckProfiler
      
      * add layout info and tests, make sure ckprofiler returns 0
      
      * fix syntax
      
      * change layout output
      
      * fix syntax
      
      * fix syntax again
      
      * update script to process perf results
      
      * rearrange jenkins stages
      
      * fix typo
      
      * add python packages to Docker file
      
      * adding setuptools-rust package
      
      * modify parsing for new test parameters
      
      * test db credentials on jenkins
      
      * fix syntax
      
      * update python script to handle incomplete lines
      
      * ungrade python to 3.8 and write the gemm_params table
      
      * add sqlalchemy package to docker
      
      * move perf data processing to master node
      
      * move the master node inside a steps region
      
      * add new stage for result processing
      
      * move results processing to separate stage
      
      * reduce number of tests to speedup debugging
      
      * pass config to processPerfResults stage
      
      * run script on master in a docker container
      
      * replace show_node_info
      
      * try loading docker on master node again
      
      * use ansible node instead of master
      
      * get rid of pymysql package
      
      * try ssh connection using paramiko
      
      * put back pymysql
      
      * put the perf data processing back on the gpu node
      
      * put back artifact definition
      
      * archive the perf_log before parsing
      
      * clean up jenkinsfile, fix parsing
      
      * fix typo
      
      * enable all perf tests
      
      * put all stages in original order, finalize script
      
      * fix gpu_arch version
      
      * update parsing script
      
      * remove obsolete file causing merge conflict
      1085794d
  30. 08 May, 2022 1 commit
    • Illia Silin's avatar
      Add Benchmark test into CI (#226) · a3c910ac
      Illia Silin authored
      
      
      * add performance test to jenkins pipeline
      
      * fix typo
      
      * fix the syntax in conv_fwd_util.cpp
      
      * fix the error message syntax spacing
      
      * fix the error message syntax spacing again
      
      * run profile_gemm and archive results
      
      * fix typo
      
      * try to figure out the paths
      
      * try to figure out the paths one more time
      
      * skip the copying step
      
      * build ckProfiler release only once
      
      * change directory using dir
      
      * fix dir syntax
      
      * change the gemm parameters
      
      * do not pipe script output to file
      
      * try running ckProfiler directly
      
      * fix typo
      
      * use set +e
      
      * run profile_gemm.sh || true
      
      * run multiple gemms and parse results
      
      * fix typo in jenkinsfile
      
      * fix syntax
      
      * add new gemm sizes, update scripts
      
      * put all jenkins steps in original order
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarChao Liu <lc.roy86@gmail.com>
      a3c910ac
  31. 22 Apr, 2022 1 commit
  32. 15 Apr, 2022 1 commit
    • Illia Silin's avatar
      Compile CK for all targets (#188) · 4221505d
      Illia Silin authored
      
      
      * compile ck for all targets
      
      * update the target criteria
      
      * change the target condition
      
      * fixed some typos
      
      * fixed missed file
      
      * revert changes in README
      
      * revert device_conv3d_fwd_xdl_...
      
      * update device_conv3d_fwd_xdl_...
      
      * update device_batched_gemm_reduce...
      
      * test the unused arguments fix
      
      * test the warning suppression
      
      * try suppress warnings in device_batched_gemm_reduce_xdl...
      
      * fix the last warnings
      
      * replace UNUSED with std::ignore
      
      * fix a typo
      
      * replaced std::ignore with ignore
      
      * add igonre header to common_header
      
      * refactor atomicAdd
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      4221505d
  33. 31 Mar, 2022 1 commit
    • Chao Liu's avatar
      Compile for gfx908 and gfx90a (#130) · cd167e49
      Chao Liu authored
      * adding compilation for multiple targets
      
      * fix build
      
      * clean
      
      * update Jekinsfile
      
      * update readme
      
      * update Jenkins
      
      * use ck::half_t instead of ushort for bf16
      
      * rename enum classes
      
      * clean
      
      * rename
      
      * clean
      cd167e49
  34. 23 Mar, 2022 1 commit
    • Adam Osewski's avatar
      Unified conv3D API + support for all data types. (#133) · f91579aa
      Adam Osewski authored
      
      
      * Convolution ND
      
      * Code unification across dimensions for generating tensor descriptors.
      * Example
      * Instances
      
      * Move convnd f32 instance file to comply with repo structure.
      
      * Conv 1D tensor layouts.
      
      * Formatting and use ReferenceConv
      
      * Reference ConvFwd supporting 1D and 2D convolution.
      
      * Debug printing TensorLayout name.
      
      * Conv fwd 1D instance f32
      
      * Refactor conv ND example.
      
      Needed to support various conv dimensio.
      
      Needed to support various conv dimensions
      
      * Rename conv nd example director to prevent conflicts.
      
      * Refactor some common utility to single file.
      
      Plus some tests.
      
      * Refactor GetHostTensorDescriptor + UT.
      
      * Add 1D test case.
      
      * Test reference convolution 1d/2d
      
      * Remove some leftovers.
      
      * Fix convolution example error for 1D
      
      * Refactor test check errors utility function.
      
      * Test Conv2D Fwd XDL
      
      * More UT for 1D case.
      
      * Parameterize input & weight initializers.
      
      * Rename example to prevent conflicts.
      
      * Split convnd instance into separate files for 1d/2d
      
      * Address review comments.
      
      * Fix data type for flops/gbytes calculations.
      
      * Assign example number 11.
      
      * 3D cases for convolution utility functions.
      
      * 3D reference convolution.
      
      * Add support for 3D convolution.
      
      * Check for inputs bigger than  2GB.
      
      * Formatting
      
      * Support for bf16/f16/f32/i8 - conv instances + UT.
      
      * Use check_err from test_util.hpp.
      
      * Split convnd test into separate files for each dim.
      
      * Fix data generation and use proper instances.
      
      * Formatting
      
      * Skip tensor initialization if not necessary.
      
      * Fix CMakefiles.
      
      * Remove redundant conv2d_fwd test.
      
      * Lower problem size for conv3D UT.
      
      * 3D case for convnd example.
      
      * Remove leftovers after merge.
      
      * Add Conv Specialization string to GetTypeString
      
      * Skip instance causing numerical errors.
      
      * Small fixes.
      
      * Remove redundant includes.
      
      * Fix namespace name error.
      
      * Script for automatic testing and logging convolution fwd UTs
      
      * Comment out numactl cmd.
      
      * Refine weights initalization and relax rtol for fp16
      
      * Fix weights initialization for int8.
      
      * Add type_convert when store output in ref conv 1D.
      
      * Get back old conv2d_fwd_xdl operation.
      
      * Silence conv debug print.
      
      * format
      
      * clean
      
      * clean
      
      * Fix merge.
      
      * Fix namespace for check_err
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      f91579aa
  35. 22 Mar, 2022 1 commit
    • Qianfeng's avatar
      Reduction for int8 and bfloat16 (#125) · 9a8ee8a3
      Qianfeng authored
      
      
      * Use thread cluster descriptor and explicit M_K 2d descriptor to simply Blockwise Reduction
      
      * Change by replacing ReduceDims by NumReduceDims as Device Reduce interface template parameter
      
      * Rename the folder name for the pool2d and reduce examples
      
      * Update to reduction test scripts
      
      * Add Readme for pool2d_fwd and reduce_blockwise examples
      
      * Add support for int8_t reduction (ADD/AVG, MIN/MAX/AMAX)
      
      * Tiny fix in reduce profiler and tiny update in reduce testing scripts
      
      * Tiny fix in testing script profile_reduce_no_index.sh
      
      * Tiny fix in testing script profile_reduce_no_index.sh
      
      * Add support for bfp16 reduction (using bhalf_t = ushort)
      
      * Tiny fix in amd_buffer_addressing.hpp
      
      * Tiny change in script/profile_reduce_with_index.sh
      
      * Use AccDataType for Beta value and use element_wise::PassThrough
      
      * Use type_convert for type converting in host layer reduction
      
      * Renaming and refining in Reduction profiler/device layer/examples
      
      * Renaming and refining in Reduction profiler/device layer/examples
      
      * Renaming all NumReduceDims to NumReduceDim
      
      * Fix the leaked type_convert in ThreadwiseTensorSliceTransfer_v2
      
      * Update to testing scripts to add bf16 support
      
      * added more static_assert
      
      * Remove buggy tunable configurations defined in device_reduce_instance_xxx.hpp
      
      * Add static_assert to give compile-time warning for incorrect thread slice-size/vector-size configurations
      
      * minor change
      
      * Refine and fix (in GetWorkspaceSizeInBytes of MultiBlockPartialReduce) to make int8 completely pass
      
      * Tiny renaming in gridwise_2d_reduction_multiblock_partial_reduce.hpp
      
      * Tiny fix in script/profile_reduce_no_index.sh
      
      * Refine in DeviceReduce layer with regard to using NumInvariantDim/NumReduceDim or InvariantDims/ReduceDims
      
      * Generic renaming in host reduction and DeviceReduce layer
      
      * Add support for 4-d all dimension reduction in the profiler and add_device_reduce_xxx instances
      
      * Use multi-thread and simplification for host Reduction implementation
      
      * Add ctest for reduction
      
      * Update to clarify the using of data init method in produce_reduce/example_reduce/test_reduce/
      
      * Update to the reduce CTest executables to enable default testing behavior when no command argument
      
      * Renaming
      Co-authored-by: default avatarJianfeng yan <jfyan008@gmail.com>
      9a8ee8a3
  36. 11 Mar, 2022 1 commit