1. 26 Oct, 2022 3 commits
  2. 25 Oct, 2022 4 commits
    • Qianfeng's avatar
      Update to the Reduction API and instances (#476) · dda3a0a1
      Qianfeng authored
      * Simplify the macros for declaring and defining the add_device_reduce_instance_xxxx() instances
      
      * Change the types of lengths and strides from std::vector to std::array for the reduction device interfaces
      
      * Remove DeviceSoftmaxImpl's depending on DeviceReduceMultiblock
      
      * Split the cpp and hpp files for reduction instances to enable more parallel compiling
      
      * Remove the using of macros for declaring reduction instances and instance references
      
      * Update to add_device_reduce_instance_xxxx templated functions
      
      * Use ReduceOperation+InElementwiseOp+AccElementwiseOp to repace the ReduceOpId in defining add_reduce_instance_xxxx() templates
      
      * Change return format
      dda3a0a1
    • guangzlu's avatar
      Revert "Fused elementwise layernorm (#468)" (#491) · 6ea9257e
      guangzlu authored
      This reverts commit efbcc6ed.
      6ea9257e
    • rocking's avatar
      clang format · ad29b25b
      rocking authored
      ad29b25b
    • guangzlu's avatar
      Fused elementwise layernorm (#468) · efbcc6ed
      guangzlu authored
      * add fused addition lyernorm
      
      * add fused addition lyernorm
      
      * changed CMakelist
      
      * removed annotates
      
      * modified descriptor of C
      
      * fixed bug in gridwise add layernorm
      
      * format the files
      
      * modified name from add&layernorm into elementwise&layernorm
      
      * created fused elementwise layernorm branch
      
      * change input into tuple type
      
      * add sweep once to reduce load & read of C from global memory
      
      * modified Argument api
      
      * modified way to malloc c in global memory
      
      * changed gamma and beta to m_k_desc
      
      * fixed bug when sweep once and move CDataType when define device level struct
      
      * add src dim for gamma and beta
      
      * implement optimization for coalesced
      
      * delete a annotation line
      
      * fixed some bug to meet the requirements of ck
      
      * add bandwidth computing in example, and fixed the time unit
      
      * move device_elementwise_layernorm_impl.hpp into device/impl
      
      * fixed bug in device_elementwise_layernorm_impl.hpp
      
      * changed name from layernorm into normalization
      
      * clang-format the changed files
      
      * changed the names
      
      * moved immidiate results into lds, it become faster in non-sweeponce cases
      
      * changed naming of C into X to make the defination more clear
      
      * changed naming in example
      
      * add tests for elementwise normalization
      
      * move example_elementwise_layernorm_blockwise into folder 44_elementwise_normalization
      
      * move test_elementwise_layernorm_fp16 into new folder
      
      * move elementwise_normalization_instances into a new folder
      
      * add more tests in test_elementwise_layernorm_fp16.cpp
      
      * added some corner cases in test
      
      * fixed method to compute lds size for matrix X
      
      * changed name of 44_elementwise_normalization into 45_elementwise_normalization
      
      * modified some comments
      
      * modified some other confused comments
      
      * reduce redundant tests in test_elementwise_layernorm_fp16.cpp
      efbcc6ed
  3. 24 Oct, 2022 2 commits
  4. 21 Oct, 2022 2 commits
  5. 19 Oct, 2022 1 commit
  6. 18 Oct, 2022 2 commits
  7. 17 Oct, 2022 2 commits
    • arai713's avatar
      adding tensor_permutation example folder (#389) · cee440fe
      arai713 authored
      * adding tensor_permutation example folder
      
      * fixed formatting
      
      * adding tensor_permutation example folder
      
      * fixed formatting
      
      * changed deviceelementwise parameters for outscalar
      
      * removed .swo file
      
      * updated folder/file name
      
      * changed function call in verification for better consistency with hostelementwist parameters
      
      * formatted again
      
      * fixed shape in verification function call
      
      * changed verification function call, added definition for nhwc
      
      * added elementwise permute example
      
      * updated CMakeLists file in folder
      
      * Delete CmakeLists.txt
      
      * Delete tensor_permute.cpp
      
      * first version of 2d gridwise_elementwise kernel
      
      * temporary fix for stride problem
      
      * formatting
      
      * format
      
      * changed directory name
      
      * Delete gridwise_elementwise_2d.hpp
      
      * Delete CMakeLists.txt
      
      * Delete extra file
      
      * delete extra file
      
      * got rid of extraneous code
      
      * added 2d device elementwise file
      
      * deleted accidently added file
      
      * update
      
      * stride values generalized with equations
      
      * updated stride for output matrix
      
      * Update CMakeLists.txt
      
      * removed extraneous commented code
      
      * removed shape_nchw vector, replaced with GetLength for each dimension
      
      * changed vector load in kernel call
      
      * removed extra space in CMake
      cee440fe
    • rocking's avatar
      9c577e08
  8. 14 Oct, 2022 3 commits
  9. 13 Oct, 2022 5 commits
  10. 12 Oct, 2022 6 commits
  11. 11 Oct, 2022 2 commits
    • ltqin's avatar
      Example contraction splitk (#430) · d8b41e1c
      ltqin authored
      * start split k
      
      * add base device class
      
      * add example after merge develop
      
      * add gridwise gemm
      
      * add b matrix split k
      
      * split=1
      
      * change name for kb
      
      * not bias result right
      
      * bias only add once
      
      * fix register spill
      
      * regular code
      
      * add fp32 example
      
      * fix for 64bit index
      
      * fix CheckValidity of gridwise
      d8b41e1c
    • Illia Silin's avatar
      Fix build issue and schedule daily tests with latest staging compiler version. (#470) · 39abb470
      Illia Silin authored
      * run branch once a day, with release and staging compilers
      
      * add GetDockerImage in Clang stage
      
      * apply the new triggers to the develop branch
      39abb470
  12. 07 Oct, 2022 2 commits
  13. 03 Oct, 2022 3 commits
    • Chao Liu's avatar
      Update readme (#465) · 9d8f834a
      Chao Liu authored
      * update cmake script
      
      * update readme
      
      * Update README.md
      
      * add citation
      
      * add images
      
      * Update README.md
      
      * update
      
      * Update README.md
      
      * Update CONTRIBUTORS.md
      
      * Update README.md
      
      * Update CITATION.cff
      
      * Update README.md
      
      * Update CITATION.cff
      
      * update doc
      
      * Update CONTRIBUTORS.md
      
      * Update LICENSE
      
      * update
      9d8f834a
    • Chao Liu's avatar
      Update doc (#464) · 6de749e2
      Chao Liu authored
      * update cmake script
      
      * update readme
      
      * Update README.md
      
      * add citation
      
      * add images
      
      * Update README.md
      
      * update
      
      * Update README.md
      
      * Update CONTRIBUTORS.md
      
      * Update README.md
      
      * Update CITATION.cff
      
      * Update README.md
      
      * Update CITATION.cff
      
      * update doc
      
      * Update CONTRIBUTORS.md
      
      * Update LICENSE
      6de749e2
    • Chao Liu's avatar
      update document: Readme, contributors, citation, (#463) · 473ba5bc
      Chao Liu authored
      * update cmake script
      
      * update readme
      
      * Update README.md
      
      * add citation
      
      * add images
      
      * Update README.md
      
      * update
      
      * Update README.md
      
      * Update CONTRIBUTORS.md
      
      * Update README.md
      
      * Update CITATION.cff
      
      * Update README.md
      
      * Update CITATION.cff
      473ba5bc
  14. 01 Oct, 2022 1 commit
    • Illia Silin's avatar
      Allow setting ROCM version, activate cchache, etc. (#462) · 7fc3ed76
      Illia Silin authored
      * enable ccache and decouple it from MIOpen ccache use
      
      * fix the ccache check script
      
      * use another method to get server name
      
      * fix syntax
      
      * add quotes around the server name variable
      
      * use check_host as function
      
      * change syntax
      
      * fix syntax
      
      * test if server name is parsed correctly
      
      * try different syntax
      
      * check the env var value
      
      * test new check node function
      
      * add ROCMVERSION parameter and fix script syntax
      
      * fix script syntax
      
      * add missing instances of rocm version
      
      * install ccache in the docker image
      
      * do not check GPU in clang format stage, clean up old code
      
      * update defaults and clean up
      7fc3ed76
  15. 27 Sep, 2022 1 commit
    • Illia Silin's avatar
      Fix build issues, set new compiler default, etc. (#451) · b8825547
      Illia Silin authored
      * add an option to select specific compiler commit
      
      * change the logic of forcing building a docker
      
      * add check for compiler commit in dockerfile
      
      * compiler check syntax fix
      
      * change compiler selection logic
      
      * fix the new compiler build issue
      
      * set new compiler as default, update dev-requirements
      
      * fix jenkins syntax
      
      * fix docker syntax
      
      * get rid of hipcc.pl editing in jenkinsfile
      
      * fix the hipcc.pl in both places
      
      * try to fix the 10738 compiler linking bug
      
      * fix syntax
      
      * use dockerhub to store images
      
      * use newer amd-stg-open commit as default
      b8825547
  16. 23 Sep, 2022 1 commit