1. 12 Jul, 2022 1 commit
  2. 08 Jul, 2022 2 commits
    • Po Yen Chen's avatar
      GEMM pipeline v2 (#317) · 63914743
      Po Yen Chen authored
      
      
      * format
      
      * improving pipeline
      
      * fix typo
      
      * format
      
      * adding thread group
      
      * adding thread group
      
      * adding thread group
      
      * adding gemm pipeline
      
      * tweak
      
      * refactor
      
      * refactor
      
      * add missing type convert
      
      * refactor
      
      * refactor
      
      * refactor
      
      * clean
      
      * fix build
      
      * refactor
      
      * format
      
      * clean up
      
      * use remove_cvref_t
      
      * clean
      
      * use pipeline_v2 for gemm kernel
      
      * Remove inconsistent indent
      
      * Fix compilation errors due to incomplete merge process
      
      * Add missing include directives
      
      * Fix compilation errors in currently unused files
      
      * Add license in newly added files
      
      * Re-format touched files by clang-format-10
      
      * Fix wrong template argument count of DeviceGemm<>
      
      * Use language construct to choose between types
      
      * Use language construct to choose GEMM example instance
      
      * Fix compilation error due to interface change
      
      * Re-use type alias to avoid duplication
      
      * Unify type alias usage in source file
      
      * Only use v2 pipeline in one gridwise GEMM type
      
      * Remove no-longer used include directives
      
      * Add static_assert() to check pipeline type requirements
      
      * Revert "Add static_assert() to check pipeline type requirements"
      
      This reverts commit f0985f0a132671a1caaea92810c9f30dcf062bde.
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: wangshaojie6's avatarshaojiewang <wsjmessi@163.com>
      63914743
    • Shaojie WANG's avatar
      add conv1d/3d bwd weight instances (#318) · 763ca615
      Shaojie WANG authored
      * add conv1d/3d bwd weight instances
      
      * add profiler code
      763ca615
  3. 07 Jul, 2022 1 commit
    • Chao Liu's avatar
      N-D Tensor Contraction example, instance, and client example (#270) · 4fe9c393
      Chao Liu authored
      * adding contraction
      
      * add contraction example
      
      * update examle
      
      * update example
      
      * format
      
      * update readme
      
      * clean header
      
      * clean header
      
      * contraction with multiple D
      
      * rename
      
      * fix naming issue; add instances for contraction+bilinear
      
      * change assumed virtual layout of contraction; add client example
      
      * update example
      
      * update
      
      * contraction+scale
      
      * use type_convert
      
      * rename
      4fe9c393
  4. 06 Jul, 2022 1 commit
  5. 02 Jul, 2022 3 commits
  6. 01 Jul, 2022 7 commits
  7. 30 Jun, 2022 1 commit
    • Anthony Chang's avatar
      Standalone sweep once softmax kernel w/ ckProfiler (#295) · 93c99f3d
      Anthony Chang authored
      * use 'sweep once' softmax kernel where applicable
      
      * threadwise copy's dst buffer can specify invalid element value
      
      * add int8 in/out float compute softmax support
      
      give a bit of leeway for int absolute tolerance as there's a single data point of all test cases showing off-by-1 error
      
      * format
      
      * softmax inherits DeviceNormalization
      
      * softmax profiler stub
      
      * tighten up reference softmax interface
      
      * example prints tensor dimension
      
      * add fp32 to softmax profiler
      
      * rename header
      
      * hook with ckProfiler
      
      * format
      
      * resolve merge conflict
      
      * resolve merge conflicts
      
      * update normalization profiler help string
      
      * resolve conflict
      
      * typo
      
      * remove residual
      
      * softmax profiler: address feedback
      
      * test for mixed precision input/output
      
      * fully qualify ck::math::isnan
      
      * add comment for device normalization interface
      
      * revise wording
      
      * constness for alpha/beta scaler pointer
      93c99f3d
  8. 27 Jun, 2022 2 commits
    • rocking5566's avatar
      external api for gemm + layernorm (#285) · 12235112
      rocking5566 authored
      * Extract base class for elementwise
      
      * Refactor interface of DeviceGemmReduce. Do not use tuple in interface
      
      * [What] Rename d into reduce in gemm + reduction related code
      [Why] Prepare to add d term for add
      
      * Unify base class of gemm + reduce and gemm + bias + add + reduce
      
      * 1. Rename gemm_bias_add_reduce for external api
       2. Refine cmake
      
      * Add normalize device operation
      
      * [What] Reorder the argument
      [Why] Because d0 is also the input of c.
      
      * Add type string
      
      * Add example of gemm_bias_add_layernorm  via external api
      
      * Refactor example code
      
      * clang-format
      
      * Fix compile error
      
      * clang-format
      
      * Add external api for gemm_add_add_layernorm and normalize
      
      * Add client example
      
      * clang-format
      12235112
    • Chao Liu's avatar
      External Interface (#304) · aebd211c
      Chao Liu authored
      * add client example
      
      * clean
      
      * clean
      
      * reorg
      
      * clean up profiler
      
      * reorg
      
      * clea
      
      * fix profiler
      
      * function for getinstances
      
      * update client example
      
      * update client example
      
      * update client example
      
      * update
      
      * update example
      
      * update Jenkins file
      
      * update cmake
      
      * update Jenkins
      aebd211c
  9. 25 Jun, 2022 2 commits
    • Chao Liu's avatar
      add license in file (#303) · d3051d75
      Chao Liu authored
      d3051d75
    • Chao Liu's avatar
      Absolute include path (#281) · d1db6a0c
      Chao Liu authored
      * ad gelu and fast_gelu
      
      * added GeLU and fast GeLU
      
      * clean up
      
      * add gemm+fastgelu example
      
      * add gemm+gelu instances
      
      * update profiler
      
      * clean up
      
      * clean up
      
      * adding gemm+bias+activation
      
      * clean
      
      * adding bias
      
      * clean
      
      * adding gemm multiple d
      
      * debugging
      
      * add gemm bias add fastgelu
      
      * rename, clean
      
      * refactoring; add readme
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix
      
      * fix
      
      * update example
      
      * update example
      
      * rename
      
      * update example
      
      * add ckProfiler
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * add client app example
      
      * update readme
      
      * delete obselete files
      
      * remove old client app
      
      * delete old file
      
      * cleaning
      
      * clean
      
      * remove half
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path for all examples
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * revert client app example
      
      * clean build
      
      * fix build
      
      * temporary disable client test on Jenkins
      
      * clean
      
      * clean
      
      * clean
      d1db6a0c
  10. 23 Jun, 2022 2 commits
    • Chao Liu's avatar
      update license (#297) · a49115b9
      Chao Liu authored
      * update license
      
      * update license
      
      * update license
      
      * update license
      a49115b9
    • Adam Osewski's avatar
      Testing all fwd convolution specializations. (#259) · a2edd7d8
      Adam Osewski authored
      
      
      * UniforFill with integer values.
      
      * Log tested instance type string.
      
      * Add UT for all convolution specializations.
      
      * debugging conv
      
      * Fix dangling reference bug.
      
      * Small refinements.
      
      * Fix call to error checking function.
      
      * Small refinements to tests.
      
      * Configure error tolerance
      * Change problem size.
      * Remove OddC case from types that do not support it.
      
      * Add helper traits for AccumulatorDataType.
      
      * Print first 5 errs in check_err for integral types.
      
      * Rename FillUniform to FillUniformDistribution
      
      * Refactor
      
      * Do not use typed tests.
      * Instead use plain fixture class with templatized member functions.
      * Initialize tensors with integer values.
      
      * Refine test instances.
      
      * Properly set accumulator data type.
      * Add another "big" instance.
      
      * Refactor convolution tests.
      
      * Revert "debugging conv"
      
      This reverts commit b109516455631ff8fd6dce99cf7c14bf8e323ebb.
      
      * Add pragma once + format + small refinement.
      
      * Fix some unwanted changes.
      
      * Clang-format
      
      * Fix profile_convnd to use renamed tensor initializer.
      
      * Add instances for ConvFWDND kernel case 2D
      
      * Helpers to get ConvNDFwd 2D instances.
      
      * Refactoring.
      
      * Remove "small block" instance as it was generating compiler errors.
      * Remove default template parameters values.
      
      * Refine and fix test.
      
      * Fix problem with default template parameter types.
      * Adjust error thresholds for floating point values test.
      * Use integer values initialization for instances test.
      * Add tests for ConvNDFwd 2D case.
      
      * Remove AccumulatorDataType type trait.
      
      * Update unit-tests.
      
      * Remove operator<< overload.
      
      * Unlock conv1d/3d nd fwd instances.
      
      * Enable skipping calculating reference using flag.
      
      * Fix number of channels for first ResNet50 layer.
      
      * Clang-format.
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      a2edd7d8
  11. 21 Jun, 2022 5 commits
    • Shaojie WANG's avatar
      fix Issue 291 (#294) · 4634b120
      Shaojie WANG authored
      * rename for typeconvert functor
      
      * refine code
      4634b120
    • Anthony Chang's avatar
      Standalone softmax kernel (#284) · 15c89e81
      Anthony Chang authored
      * initial stub for standalone softmax
      
      * start device_softmax_mk_to_mk as a wrapper to device_reduce_mk_to_m
      
      * host softmax validates
      
      * compiles; to implement beta scaling
      
      * use NaN trick to efficiently ignore OOB values during sum of exponentials
      
      * freeload device_reduce's utility functions
      
      * clean up interface
      
      * adding prior value (beta scaling)
      
      * remove restriction related to perf considerations
      
      * apply clang-format
      
      * clean; disable diagnostics
      
      * resolve conflicts
      
      * add exp wrapper
      
      * honor HostTensorDesc interface; allow implicit cast from different vector<T> type
      
      * test softmax for fp16/fp32
      
      * update readme
      
      * amend commit NaN trick
      
      * remove redundant param added during development
      
      * format
      
      * replace ScalarDataType with AccDataType
      
      * separate out test programs by precision type
      
      * move softmax sample code to its own folder
      
      * format
      
      * keep up with recent changes in reduction API
      
      * remove extra header
      15c89e81
    • Chao Liu's avatar
      Create MIT LICENSE (#229) · be60d60d
      Chao Liu authored
      * Create LICENSE
      
      * add contributors, add license into config.hpp
      
      * update
      be60d60d
    • Anthony Chang's avatar
    • carlushuang's avatar
      fix a bug when upsampling value · 6ffd41ae
      carlushuang authored
      6ffd41ae
  12. 20 Jun, 2022 2 commits
  13. 19 Jun, 2022 2 commits
    • carlushuang's avatar
      add direct-conv first version · f29a5350
      carlushuang authored
      f29a5350
    • Chao Liu's avatar
      GEMM with Multiple Source, GEMM+Bias+Add+FastGeLU example and ckProfiler (#241) · 56adf7e9
      Chao Liu authored
      * ad gelu and fast_gelu
      
      * added GeLU and fast GeLU
      
      * clean up
      
      * add gemm+fastgelu example
      
      * add gemm+gelu instances
      
      * update profiler
      
      * clean up
      
      * clean up
      
      * adding gemm+bias+activation
      
      * clean
      
      * adding bias
      
      * clean
      
      * adding gemm multiple d
      
      * debugging
      
      * add gemm bias add fastgelu
      
      * rename, clean
      
      * refactoring; add readme
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix
      
      * fix
      
      * update example
      
      * update example
      
      * rename
      
      * update example
      
      * add ckProfiler
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * add comment
      
      * use type_convert
      
      * clean
      
      * clean element wise op
      56adf7e9
  14. 17 Jun, 2022 4 commits
    • Qianfeng's avatar
      Regulate reduction accumulator operations and Element-wise operations (#274) · 1f543bfa
      Qianfeng authored
      * Remove template from Reducton operation classes and add template to their operator() and GetIdentityValue() interfaces
      
      * Change to unary elementwise operators and the reduce_unary_operator (class for mapping) and dependent variations in all host layers
      
      * Remove the data type template parameter from reduce_binary_operator (class for mapping) and dependent variations in host layers
      
      * Add InMemoryDataOperatonSupportedOnDataType to check the matching between data type and InMemoryDataOperation
      
      * Use struct-scope operator template instantiation for binary and unary element-wise operations
      
      * Change a few more elementwise operations to use template for operator()
      
      * Tiny correction in Normalize operator
      
      * Add static_assert to check the data type appliability for some reduction accumulator and element-wise operatons
      
      * Correction in some examples with regard to using ReduceAccDataType
      
      * Use static_assert for UnaryDivide
      
      * Update to merged codes to use Element-wise operations and Reduction Accumulator operations correctly
      
      * Tiny fix with regard to SetWorkSpacePointer()
      1f543bfa
    • Shaojie WANG's avatar
      63cdd923
    • ltqin's avatar
      add p_workspace to baseargument (#275) · c7a96ed5
      ltqin authored
      c7a96ed5
    • rocking5566's avatar
      Gemm + bias + relu + add + layernorm (#272) · 6eb55499
      rocking5566 authored
      * Copy "gemm reduce" to "gemm bias add reduce"
      
      * Implement gemm bias add reduction
      
      * Fix compiler error due to merge from develop
      
      * Add tensor operation for gemm + bias + add + reduce
      
      * Add gemm_bais_add_reduce to ckProfiler
      
      * Add c1 functor
      
      * Refine type
      
      * Use reduceAccDataType instead of explicitly float
      
      * Change to use check_err()
      
      * Do relu in float32 instead of bhalf_t. Because bhalf_t is unsigned
      
      * Refactor relu. using type_trait instead of overloading
      
      * Rename DxsReduceAccElementwiseOperation to DxsReduceAccElementwiseOperation
      
      * Fix denominator
      
      * Refine nameing
      
      * Fix denominator  in host
      
      * Remove useless include header
      
      * Use AccDataType
      
      * Fix static_cast order
      
      * Refine type
      
      * [What] Remove tuple type in the base class
      [Why] External api depend on base class. if base class has relationship with type, we will need many class for different type
      6eb55499
  15. 16 Jun, 2022 1 commit
    • Shaojie WANG's avatar
      example for convnd bwd weight bf16 splitk (#265) · 561ec12f
      Shaojie WANG authored
      * add GetWorkSpaceSize to base arg and make an example on convnd_bwd_weight
      
      * add bwd weight for bf16: init
      
      * remove redundant compute
      
      * use datatype and split k to check whether a workspace is used
      
      * remove unused computation for work space size
      
      * add some code for bfp16
      
      * add device/grid unary op
      
      * add unary type convert to bwd-weight example
      
      * support bf16 splitk kernel for convnd bwd weight
      
      * 1. remove comments. 2. add checkvalidity. 3. add gridsize computation
      
      * add workspace size check
      
      * fix format
      
      * change function name
      561ec12f
  16. 14 Jun, 2022 2 commits
  17. 13 Jun, 2022 1 commit
  18. 09 Jun, 2022 1 commit