1. 25 Jun, 2022 2 commits
    • Chao Liu's avatar
      add license in file (#303) · d3051d75
      Chao Liu authored
      d3051d75
    • Chao Liu's avatar
      Absolute include path (#281) · d1db6a0c
      Chao Liu authored
      * ad gelu and fast_gelu
      
      * added GeLU and fast GeLU
      
      * clean up
      
      * add gemm+fastgelu example
      
      * add gemm+gelu instances
      
      * update profiler
      
      * clean up
      
      * clean up
      
      * adding gemm+bias+activation
      
      * clean
      
      * adding bias
      
      * clean
      
      * adding gemm multiple d
      
      * debugging
      
      * add gemm bias add fastgelu
      
      * rename, clean
      
      * refactoring; add readme
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix
      
      * fix
      
      * update example
      
      * update example
      
      * rename
      
      * update example
      
      * add ckProfiler
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * add client app example
      
      * update readme
      
      * delete obselete files
      
      * remove old client app
      
      * delete old file
      
      * cleaning
      
      * clean
      
      * remove half
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path for all examples
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * fix header path
      
      * revert client app example
      
      * clean build
      
      * fix build
      
      * temporary disable client test on Jenkins
      
      * clean
      
      * clean
      
      * clean
      d1db6a0c
  2. 23 Jun, 2022 2 commits
    • Chao Liu's avatar
      update license (#297) · a49115b9
      Chao Liu authored
      * update license
      
      * update license
      
      * update license
      
      * update license
      a49115b9
    • Adam Osewski's avatar
      Testing all fwd convolution specializations. (#259) · a2edd7d8
      Adam Osewski authored
      
      
      * UniforFill with integer values.
      
      * Log tested instance type string.
      
      * Add UT for all convolution specializations.
      
      * debugging conv
      
      * Fix dangling reference bug.
      
      * Small refinements.
      
      * Fix call to error checking function.
      
      * Small refinements to tests.
      
      * Configure error tolerance
      * Change problem size.
      * Remove OddC case from types that do not support it.
      
      * Add helper traits for AccumulatorDataType.
      
      * Print first 5 errs in check_err for integral types.
      
      * Rename FillUniform to FillUniformDistribution
      
      * Refactor
      
      * Do not use typed tests.
      * Instead use plain fixture class with templatized member functions.
      * Initialize tensors with integer values.
      
      * Refine test instances.
      
      * Properly set accumulator data type.
      * Add another "big" instance.
      
      * Refactor convolution tests.
      
      * Revert "debugging conv"
      
      This reverts commit b109516455631ff8fd6dce99cf7c14bf8e323ebb.
      
      * Add pragma once + format + small refinement.
      
      * Fix some unwanted changes.
      
      * Clang-format
      
      * Fix profile_convnd to use renamed tensor initializer.
      
      * Add instances for ConvFWDND kernel case 2D
      
      * Helpers to get ConvNDFwd 2D instances.
      
      * Refactoring.
      
      * Remove "small block" instance as it was generating compiler errors.
      * Remove default template parameters values.
      
      * Refine and fix test.
      
      * Fix problem with default template parameter types.
      * Adjust error thresholds for floating point values test.
      * Use integer values initialization for instances test.
      * Add tests for ConvNDFwd 2D case.
      
      * Remove AccumulatorDataType type trait.
      
      * Update unit-tests.
      
      * Remove operator<< overload.
      
      * Unlock conv1d/3d nd fwd instances.
      
      * Enable skipping calculating reference using flag.
      
      * Fix number of channels for first ResNet50 layer.
      
      * Clang-format.
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      a2edd7d8
  3. 21 Jun, 2022 4 commits
    • Shaojie WANG's avatar
      fix Issue 291 (#294) · 4634b120
      Shaojie WANG authored
      * rename for typeconvert functor
      
      * refine code
      4634b120
    • Anthony Chang's avatar
      Standalone softmax kernel (#284) · 15c89e81
      Anthony Chang authored
      * initial stub for standalone softmax
      
      * start device_softmax_mk_to_mk as a wrapper to device_reduce_mk_to_m
      
      * host softmax validates
      
      * compiles; to implement beta scaling
      
      * use NaN trick to efficiently ignore OOB values during sum of exponentials
      
      * freeload device_reduce's utility functions
      
      * clean up interface
      
      * adding prior value (beta scaling)
      
      * remove restriction related to perf considerations
      
      * apply clang-format
      
      * clean; disable diagnostics
      
      * resolve conflicts
      
      * add exp wrapper
      
      * honor HostTensorDesc interface; allow implicit cast from different vector<T> type
      
      * test softmax for fp16/fp32
      
      * update readme
      
      * amend commit NaN trick
      
      * remove redundant param added during development
      
      * format
      
      * replace ScalarDataType with AccDataType
      
      * separate out test programs by precision type
      
      * move softmax sample code to its own folder
      
      * format
      
      * keep up with recent changes in reduction API
      
      * remove extra header
      15c89e81
    • Chao Liu's avatar
      Create MIT LICENSE (#229) · be60d60d
      Chao Liu authored
      * Create LICENSE
      
      * add contributors, add license into config.hpp
      
      * update
      be60d60d
    • Anthony Chang's avatar
  4. 20 Jun, 2022 2 commits
  5. 19 Jun, 2022 1 commit
    • Chao Liu's avatar
      GEMM with Multiple Source, GEMM+Bias+Add+FastGeLU example and ckProfiler (#241) · 56adf7e9
      Chao Liu authored
      * ad gelu and fast_gelu
      
      * added GeLU and fast GeLU
      
      * clean up
      
      * add gemm+fastgelu example
      
      * add gemm+gelu instances
      
      * update profiler
      
      * clean up
      
      * clean up
      
      * adding gemm+bias+activation
      
      * clean
      
      * adding bias
      
      * clean
      
      * adding gemm multiple d
      
      * debugging
      
      * add gemm bias add fastgelu
      
      * rename, clean
      
      * refactoring; add readme
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * refactor
      
      * fix
      
      * fix
      
      * update example
      
      * update example
      
      * rename
      
      * update example
      
      * add ckProfiler
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * add comment
      
      * use type_convert
      
      * clean
      
      * clean element wise op
      56adf7e9
  6. 17 Jun, 2022 4 commits
    • Qianfeng's avatar
      Regulate reduction accumulator operations and Element-wise operations (#274) · 1f543bfa
      Qianfeng authored
      * Remove template from Reducton operation classes and add template to their operator() and GetIdentityValue() interfaces
      
      * Change to unary elementwise operators and the reduce_unary_operator (class for mapping) and dependent variations in all host layers
      
      * Remove the data type template parameter from reduce_binary_operator (class for mapping) and dependent variations in host layers
      
      * Add InMemoryDataOperatonSupportedOnDataType to check the matching between data type and InMemoryDataOperation
      
      * Use struct-scope operator template instantiation for binary and unary element-wise operations
      
      * Change a few more elementwise operations to use template for operator()
      
      * Tiny correction in Normalize operator
      
      * Add static_assert to check the data type appliability for some reduction accumulator and element-wise operatons
      
      * Correction in some examples with regard to using ReduceAccDataType
      
      * Use static_assert for UnaryDivide
      
      * Update to merged codes to use Element-wise operations and Reduction Accumulator operations correctly
      
      * Tiny fix with regard to SetWorkSpacePointer()
      1f543bfa
    • Shaojie WANG's avatar
      63cdd923
    • ltqin's avatar
      add p_workspace to baseargument (#275) · c7a96ed5
      ltqin authored
      c7a96ed5
    • rocking5566's avatar
      Gemm + bias + relu + add + layernorm (#272) · 6eb55499
      rocking5566 authored
      * Copy "gemm reduce" to "gemm bias add reduce"
      
      * Implement gemm bias add reduction
      
      * Fix compiler error due to merge from develop
      
      * Add tensor operation for gemm + bias + add + reduce
      
      * Add gemm_bais_add_reduce to ckProfiler
      
      * Add c1 functor
      
      * Refine type
      
      * Use reduceAccDataType instead of explicitly float
      
      * Change to use check_err()
      
      * Do relu in float32 instead of bhalf_t. Because bhalf_t is unsigned
      
      * Refactor relu. using type_trait instead of overloading
      
      * Rename DxsReduceAccElementwiseOperation to DxsReduceAccElementwiseOperation
      
      * Fix denominator
      
      * Refine nameing
      
      * Fix denominator  in host
      
      * Remove useless include header
      
      * Use AccDataType
      
      * Fix static_cast order
      
      * Refine type
      
      * [What] Remove tuple type in the base class
      [Why] External api depend on base class. if base class has relationship with type, we will need many class for different type
      6eb55499
  7. 16 Jun, 2022 1 commit
    • Shaojie WANG's avatar
      example for convnd bwd weight bf16 splitk (#265) · 561ec12f
      Shaojie WANG authored
      * add GetWorkSpaceSize to base arg and make an example on convnd_bwd_weight
      
      * add bwd weight for bf16: init
      
      * remove redundant compute
      
      * use datatype and split k to check whether a workspace is used
      
      * remove unused computation for work space size
      
      * add some code for bfp16
      
      * add device/grid unary op
      
      * add unary type convert to bwd-weight example
      
      * support bf16 splitk kernel for convnd bwd weight
      
      * 1. remove comments. 2. add checkvalidity. 3. add gridsize computation
      
      * add workspace size check
      
      * fix format
      
      * change function name
      561ec12f
  8. 15 Jun, 2022 1 commit
  9. 02 Jun, 2022 7 commits
  10. 01 Jun, 2022 1 commit
  11. 31 May, 2022 10 commits
  12. 30 May, 2022 5 commits
    • rocking5566's avatar
      gemm + layernorm (#261) · d32a67a9
      rocking5566 authored
      * Implement reduction meand and reduction square mean
      
      * Refine file name
      
      * Add reduce mean and square mean
      
      * Fix parameter name
      
      * Add normalize device op (not implement invoker::run())
      
      * Remove epislon
      
      * Refine deviceop
      
      * Add 5ary elementwise for normalization
      
      * Add layernorm example
      
      * layerNorm verication
      
      * Fix compiler error due to merge from develop
      
      * Fix typo
      
      * Fix compile error
      
      * Refine naming
      
      * [What] Suport non pointer for invoker and argument
      [Why] Snyc coding style with gemm
      
      * Refine folder name
      
      * Refine class name
      
      * Evaluate perf of the kernel
      
      * Fix compile error
      
      * [What] Refine perf evaluation in example of gemm + reduction
      [Why] evaluation of gemm + reduction may cause verification fail. Because evaluation will not initial global memory
      
      * clang-format
      d32a67a9
    • Anthony Chang's avatar
      clang-format · d08aa99e
      Anthony Chang authored
      d08aa99e
    • Anthony Chang's avatar
      clang-tidy and additional comments · ebdb48ae
      Anthony Chang authored
      ebdb48ae
    • Anthony Chang's avatar
      make C0 precision type consistent with C · 7392e40c
      Anthony Chang authored
      7392e40c
    • Anthony Chang's avatar
      tidy up · ac6977f7
      Anthony Chang authored
      ac6977f7