1. 19 Jun, 2022 1 commit
    • Chao Liu's avatar
      GEMM with Multiple Source, GEMM+Bias+Add+FastGeLU example and ckProfiler (#241) · 56adf7e9
      Chao Liu authored
      * ad gelu and fast_gelu
      
      * added GeLU and fast GeLU
      
      * clean up
      
      * add gemm+fastgelu example
      
      * add gemm+gelu instances
      
      * update profiler
      
      * clean up
      
      * clean up
      
      * adding gemm+bias+activation
      
      * clean
      
      * adding bias
      
      * clean
      
      * adding gemm multiple d
      
      * debugging
      
      * add gemm bias add fastgelu
      
      * rename, clean
      
      * refactoring; add readme
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix
      
      * fix
      
      * update example
      
      * update example
      
      * rename
      
      * update example
      
      * add ckProfiler
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * add comment
      
      * use type_convert
      
      * clean
      
      * clean element wise op
      56adf7e9
  2. 17 Jun, 2022 4 commits
    • Qianfeng's avatar
      Regulate reduction accumulator operations and Element-wise operations (#274) · 1f543bfa
      Qianfeng authored
      * Remove template from Reducton operation classes and add template to their operator() and GetIdentityValue() interfaces
      
      * Change to unary elementwise operators and the reduce_unary_operator (class for mapping) and dependent variations in all host layers
      
      * Remove the data type template parameter from reduce_binary_operator (class for mapping) and dependent variations in host layers
      
      * Add InMemoryDataOperatonSupportedOnDataType to check the matching between data type and InMemoryDataOperation
      
      * Use struct-scope operator template instantiation for binary and unary element-wise operations
      
      * Change a few more elementwise operations to use template for operator()
      
      * Tiny correction in Normalize operator
      
      * Add static_assert to check the data type appliability for some reduction accumulator and element-wise operatons
      
      * Correction in some examples with regard to using ReduceAccDataType
      
      * Use static_assert for UnaryDivide
      
      * Update to merged codes to use Element-wise operations and Reduction Accumulator operations correctly
      
      * Tiny fix with regard to SetWorkSpacePointer()
      1f543bfa
    • Shaojie WANG's avatar
      63cdd923
    • ltqin's avatar
      add p_workspace to baseargument (#275) · c7a96ed5
      ltqin authored
      c7a96ed5
    • rocking5566's avatar
      Gemm + bias + relu + add + layernorm (#272) · 6eb55499
      rocking5566 authored
      * Copy "gemm reduce" to "gemm bias add reduce"
      
      * Implement gemm bias add reduction
      
      * Fix compiler error due to merge from develop
      
      * Add tensor operation for gemm + bias + add + reduce
      
      * Add gemm_bais_add_reduce to ckProfiler
      
      * Add c1 functor
      
      * Refine type
      
      * Use reduceAccDataType instead of explicitly float
      
      * Change to use check_err()
      
      * Do relu in float32 instead of bhalf_t. Because bhalf_t is unsigned
      
      * Refactor relu. using type_trait instead of overloading
      
      * Rename DxsReduceAccElementwiseOperation to DxsReduceAccElementwiseOperation
      
      * Fix denominator
      
      * Refine nameing
      
      * Fix denominator  in host
      
      * Remove useless include header
      
      * Use AccDataType
      
      * Fix static_cast order
      
      * Refine type
      
      * [What] Remove tuple type in the base class
      [Why] External api depend on base class. if base class has relationship with type, we will need many class for different type
      6eb55499
  3. 16 Jun, 2022 1 commit
    • Shaojie WANG's avatar
      example for convnd bwd weight bf16 splitk (#265) · 561ec12f
      Shaojie WANG authored
      * add GetWorkSpaceSize to base arg and make an example on convnd_bwd_weight
      
      * add bwd weight for bf16: init
      
      * remove redundant compute
      
      * use datatype and split k to check whether a workspace is used
      
      * remove unused computation for work space size
      
      * add some code for bfp16
      
      * add device/grid unary op
      
      * add unary type convert to bwd-weight example
      
      * support bf16 splitk kernel for convnd bwd weight
      
      * 1. remove comments. 2. add checkvalidity. 3. add gridsize computation
      
      * add workspace size check
      
      * fix format
      
      * change function name
      561ec12f
  4. 15 Jun, 2022 1 commit
  5. 02 Jun, 2022 7 commits
  6. 01 Jun, 2022 1 commit
  7. 31 May, 2022 10 commits
  8. 30 May, 2022 9 commits
  9. 27 May, 2022 1 commit
    • Chao Liu's avatar
      Fixing conv bug (#258) · 91d8b7d6
      Chao Liu authored
      
      
      * debugging conv
      
      * fix oversight where ctile map is constructed before initializing c desc
      
      * example program should returns error code
      
      * clean up
      
      * changed Block2CTileMap in conv2d and convnd
      
      * clean up
      
      * clean up
      
      * cleanup
      Co-authored-by: default avatarAnthony Chang <ac.chang@outlook.com>
      91d8b7d6
  10. 26 May, 2022 1 commit
    • ltqin's avatar
      Add FP64 XDL GEMM built-in function (#199) · 3e6c2610
      ltqin authored
      
      
      * add intrin_mfma_f64_16x16x4f64
      
      * add example
      
      * gemm reference add double data type
      
      * chang init data
      
      * fix M N PerXdlops
      
      * fix ifdef
      
      * add comparsion config
      
      * add conv fwd example
      
      * format log out
      
      * change rc matrix egister layout
      
      * reorganize example
      
      * reorganize example 2
      
      * format,because merge develop
      
      * fix call impl adding acc data type
      
      * lost ;
      
      * add compiler warning
      
      * change example tunning parameters
      
      * add test for fp64
      
      * add instance
      
      * add test/gemm/gemm_fp64.cpp
      
      * fix get name issue
      
      * remove some tunning parameter
      
      * fix conflict
      
      * format
      
      * use integer value for GEMM test
      
      * add acc data type
      
      * remove typeid because fp16
      
      * fix streamconfig etc bug from merging develop
      
      * format
      
      * remove test_gemm_xdl_fp64
      
      * add AccDataType
      
      * AccDataType problem
      Co-authored-by: default avatarqinletao <letaoqin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      3e6c2610
  11. 25 May, 2022 3 commits
    • rocking5566's avatar
      Hotfix binary elementwise (for broadcast on fastest axis) (#254) · 82d7d993
      rocking5566 authored
      
      
      * Support different length of ScalarPerVector
      
      * Add example of broadcast on fastest axis
      
      * Typo
      
      * Refine fastest example
      
      * Add dimension check
      
      * Modify fastest broadcast example to 3d
      
      * Enforce users give scalarPerVector explicitely
      
      * 1. Add CscalarPerVedctor
      2. Not only broadcast on fastest need to set scalarPerVector to 1
      
      * Rename var
      
      * Move IsScalarPerVectorValid() inside IsSupportedArgument()
      
      * Separate GridDesc_M0 into A, B and C
      
      * rename var
      
      * Rename var of length
      Co-authored-by: default avatarrocking <chunylai@amd.com>
      82d7d993
    • Anthony Chang's avatar
      Tensile-style block to C tile map (#239) · e579c9e5
      Anthony Chang authored
      * fix build
      
      * Revert "fix build"
      
      This reverts commit d7310238
      
      .
      
      * post PR #235 merge fix
      
      * amend
      
      * adds tensile-stype c-tile map
      
      * make it dynamic version
      
      * add k-split flavor tile map
      
      * apply tensile-style tile map to all xdl gridwise gemms
      
      * remove dead code
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      e579c9e5
    • Chao Liu's avatar
      minor fix for recent PR (#255) · 61851ae2
      Chao Liu authored
      * minor fix
      
      * clean
      61851ae2
  12. 24 May, 2022 1 commit
    • Jianfeng Yan's avatar
      Navi21 gemm (#197) · 40b59a63
      Jianfeng Yan authored
      
      
      * start adding navi21 GEMM
      
      * navi_gemm_km_kn_mn_fp32 compiles and passes one test.
      
      * rename variables and functions in gridwise_gemm_dlops_v1r3
      
      * add other 3 layouts; format instance
      
      * adding more tuning parameters
      
      add tuning parameters for other 3 layouts
      
      * add gemm_dlops_f16
      
      * tmp
      
      * add dependence of DeviceGemm::IsSupportedArg() on arch
      
      * minor changes
      
      * minor changes
      
      * minor changes
      
      * minor changes
      
      * minor changes
      
      * minor changes
      
      * minor changes
      
      * push gemm_dlops into profiler
      
      * minor changes
      
      * if using xdl or dlops is moved into profiler_gemm_impl
      
      * minor changes
      
      * minor changes
      
      * remove is_xdl from profile_gemm_impl
      
      * make IsSupportedArg dependent on arch for other device_gemm
      
      * minor changes
      
      * minor changes
      
      * fix a bug in f_generate_tensor_value
      
      * add 64x64x64 for gemm_dlops_int8
      
      * add 64x64x64 for gemm_dlops_int8
      
      * comment out 3 layouts in gemm_dlops_int8; add 32x32x32 for gemm_dlops_int8; init A values to 1
      
      * fix
      
      * start fixing tuning parameters
      
      * monir
      
      * minor changes
      
      * minor changes
      
      * minor changes
      
      * fixing
      
      * adding example
      
      * adding example
      
      * adding example
      
      * add gemm fp32 example
      
      * clean up
      
      * use 128x128x16 as MNK tile in navi21 gemm example
      
      * bug fix
      
      * fix test
      
      * use new block c tile
      
      * clean
      
      * fix build
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: wangshaojie6's avatarshaojiewang <wsjmessi@163.com>
      40b59a63