1. 27 Oct, 2022 1 commit
    • Anthony Chang's avatar
      Fused attention client example (#494) · 24fd4a0b
      Anthony Chang authored
      
      
      * reopen masking att instance due to CI is upgraded
      
      * re-enable instances previously failed on 9110
      
      * enable ksize-kpadding pair validity test
      
      * add non-masked attention+permute test; expose masking boolean to attention kernel handles
      
      * disable bench
      
      * fix test
      
      * move files
      
      * bulk rename batched_gemm_masking_scale_softmax_gemm_permute to batched_gemm_softmax_gemm_permute
      
      * format
      
      * amend rename
      
      * disable bench in test
      
      * add mask/no-mask test for non-permute attention kernels
      
      * disable broken kernel instance
      
      * example working
      
      add non-permuted problem statement
      
      evaluating whether overhead comes from permutation or the extra kernel arg
      
      * interface for bias addition without implementing it
      
      * test and profiler running
      
      * tidy
      
      * mask type determined by enum class
      
      * unify example code
      
      * move masking specialization to its own header
      
      * align formats
      
      * extract helper functions
      
      * experiment merging dims for attn w/ permute; shows perf parity with attn wo/ permute
      
      * add tensor specialization to template args
      
      since tensor spec packed shows perf parity when permutation isn't needed
      
      remove redundant template args
      
      comment on 'packed' tensor specialization
      
      * grouped attention with input/output permute example
      
      * format
      
      * clean up
      
      * refactor acc0 tile visitor
      
      * fused attention client example
      
      * format
      Co-authored-by: wangshaojie6's avatarshaojiewang <wsjmessi@163.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      24fd4a0b
  2. 13 Oct, 2022 3 commits
  3. 22 Sep, 2022 1 commit
  4. 20 Sep, 2022 1 commit
    • rocking5566's avatar
      Group norm (#417) · 4eba345f
      rocking5566 authored
      
      
      * Add groupnorm example by layernorm
      1.  Reference is not ready
      2. shape of gamma and beta need to be fix
      
      * Let shape of gamma and beta can be same as x
      
      * Modify test, instance and client example
      
      * [What] Fix bug of layernorm for greater than 2 dimension.
      [Why] We need to get upper length from merge transform instead of embed transform.
      
      * Add reference for groupnorm
      
      * Fuse sigmoid after groupnorm
      
      * [What] Rename original layernorm into layernorm2d
      [Why] Prepare to add groupnorm using layernorm5d
      
      * clang-format
      
      * Add groupnorm test
      
      * Refine error message
      
      * Add groupnorm ckProfiler
      
      * Test groupnorm kernel from device_instance
      
      * update example
      
      * upadte profiler
      
      * Fix test naming
      
      * Fix argc number
      
      * Move descriptor and sweeponce to argument for quick debugging
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      4eba345f
  5. 19 Sep, 2022 1 commit
    • Shaojie WANG's avatar
      Conv bwd data multiple d (#404) · 27858374
      Shaojie WANG authored
      
      
      * init commit of convnd bwd data
      
      * begin compiling example
      
      * have a first version that produce a right result
      
      * refine device level launch kernel code
      
      * add more instances in example and get right results
      
      * clang-format
      
      * format example file
      
      * add more instances
      
      * fix instances
      
      * adding conv_bwd_data multile_d
      
      * adding conv_bwd_data multile_d
      
      * adding conv_bwd multiple d
      
      * adding conv_bwd multiple d
      
      * adding conv_bwd multiple d
      
      * refactor
      
      * refactor
      
      * adding conv bwd data multiple d
      
      * adding conv bwd data multiple d
      
      * adding conv bwd data multiple d
      
      * adding conv bwd data multiple d
      
      * adding conv bwd data multiple d
      
      * adding conv bwd data multiple d
      
      * adding conv bwd data multiple d
      
      * refactor
      
      * update conv fwd's bias impl
      
      * refactor
      
      * reorg file
      
      * clean up cmake
      
      * clean
      
      * clean
      
      * clean
      Co-authored-by: default avatarChao Liu <lc.roy86@gmail.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      27858374
  6. 06 Sep, 2022 1 commit
  7. 24 Aug, 2022 1 commit
    • rocking5566's avatar
      layernorm external api (#379) · e1a3fff6
      rocking5566 authored
      * Add layernorm client example
      
      * [What] Add default make install dir to gitignore
      [Why] client example need to make install
      e1a3fff6
  8. 15 Aug, 2022 1 commit
    • Qianfeng's avatar
      Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320) · 53ea4713
      Qianfeng authored
      * Implement multiple-reduction in one kernel (kernels, device ops, examples)
      
      * Add generic elementwise kernel and device interface
      
      * Add generator for normal-distributed data initialization
      
      * Add host refer implementation of batchnorm-forward and batchnorm-infer
      
      * Add examples for implementing batchnorm-forward and batchnorm-infer using generic kernels
      
      * Remove un-needed including in batchnorm example
      
      * Renaming generic_elementwise to elementiwise in kernel and device classes/functions
      
      * Change in gemm_layernorm examples to use DeviceElementwise instead of Device5AryElementwise
      
      * Change in exampe 19_binary_elementwise to use DeviceElementwise instead of DeviceBinaryElementwise
      
      * Change in device_cgemm_4gemm_xdl_cshuffle.hpp to use kernel_elementwise instead of kernel_binary_elementwise
      
      * Add DeviceElementwiseBase and use it in device_normalize_instance.cpp
      
      * Removing and renaming files
      
      * Update to synchronize gemm_layernorm client example to the generic element-wise device op API
      
      * Update to synchronize with the latest headers directory and HostTensorDescriptor interface renaming
      
      * Merge two static member functions in device_elementwise.hpp
      
      * Remove unary_elementwise_1d kernel and device
      53ea4713
  9. 29 Jul, 2022 1 commit
    • Chao Liu's avatar
      Clean up conv example, Instances, profiler and test (#324) · 500fa995
      Chao Liu authored
      * convnd_fwd fp16 example
      
      * update example
      
      * update example
      
      * update instance
      
      * updating refernce conv
      
      * update reference conv
      
      * update conv fwd profiler
      
      * update conv 1d and 3d instance
      
      * update include path
      
      * clean
      
      * update profiler for conv bwd data and weight
      
      * update conv bwd weight
      
      * clean
      
      * update conv example
      
      * update profiler for conv bwd weight
      
      * update ckprofiler for conv bwd data
      
      * fix reference conv bwd data bug; update conv bwd data test
      
      * update examples
      
      * fix initialization issue
      
      * update test for conv fwd
      
      * clean
      
      * clean
      
      * remove test case too sensitive to error threshhold
      
      * fix test
      
      * clean
      
      * fix build
      
      * adding conv multiple d
      
      * adding conv multiple D
      
      * add matrix padder
      
      * add gemm padding to convnd
      
      * adding group conv
      
      * update gemm multi-d
      
      * refactor
      
      * refactor
      
      * refactor
      
      * clean
      
      * clean
      
      * refactor
      
      * refactor
      
      * reorg
      
      * add ds
      
      * add bias
      
      * clean
      
      * add G
      
      * adding group
      
      * adding group
      
      * adding group
      
      * update Tensor
      
      * clean
      
      * update example
      
      * update DeviceGemmMultipleD_Xdl_CShuffle
      
      * update conv bwd-data and bwd-weight
      
      * upate contraction example
      
      * update gemm and batch gemm with e permute
      
      * fix example build
      
      * instance for grouped conv1d
      
      * update example
      
      * adding group conv instance
      
      * update gemm bilinear instance
      
      * update gemm+add+add+fastgelu instance
      
      * update profiler
      
      * update profiler
      
      * update test
      
      * update test and client example
      
      * clean
      
      * add grouped conv into profiler
      
      * update profiler
      
      * clean
      
      * add test grouped conv, update all conv test to gtest
      
      * update test
      500fa995
  10. 13 Jul, 2022 1 commit
  11. 07 Jul, 2022 1 commit
    • Chao Liu's avatar
      N-D Tensor Contraction example, instance, and client example (#270) · 4fe9c393
      Chao Liu authored
      * adding contraction
      
      * add contraction example
      
      * update examle
      
      * update example
      
      * format
      
      * update readme
      
      * clean header
      
      * clean header
      
      * contraction with multiple D
      
      * rename
      
      * fix naming issue; add instances for contraction+bilinear
      
      * change assumed virtual layout of contraction; add client example
      
      * update example
      
      * update
      
      * contraction+scale
      
      * use type_convert
      
      * rename
      4fe9c393
  12. 01 Jul, 2022 1 commit
  13. 27 Jun, 2022 2 commits
    • rocking5566's avatar
      external api for gemm + layernorm (#285) · 12235112
      rocking5566 authored
      * Extract base class for elementwise
      
      * Refactor interface of DeviceGemmReduce. Do not use tuple in interface
      
      * [What] Rename d into reduce in gemm + reduction related code
      [Why] Prepare to add d term for add
      
      * Unify base class of gemm + reduce and gemm + bias + add + reduce
      
      * 1. Rename gemm_bias_add_reduce for external api
       2. Refine cmake
      
      * Add normalize device operation
      
      * [What] Reorder the argument
      [Why] Because d0 is also the input of c.
      
      * Add type string
      
      * Add example of gemm_bias_add_layernorm  via external api
      
      * Refactor example code
      
      * clang-format
      
      * Fix compile error
      
      * clang-format
      
      * Add external api for gemm_add_add_layernorm and normalize
      
      * Add client example
      
      * clang-format
      12235112
    • Chao Liu's avatar
      External Interface (#304) · aebd211c
      Chao Liu authored
      * add client example
      
      * clean
      
      * clean
      
      * reorg
      
      * clean up profiler
      
      * reorg
      
      * clea
      
      * fix profiler
      
      * function for getinstances
      
      * update client example
      
      * update client example
      
      * update client example
      
      * update
      
      * update example
      
      * update Jenkins file
      
      * update cmake
      
      * update Jenkins
      aebd211c