1. 30 Dec, 2021 3 commits
  2. 29 Dec, 2021 1 commit
  3. 27 Dec, 2021 2 commits
  4. 26 Dec, 2021 1 commit
    • Chao Liu's avatar
      Fusion Conv+Bias+ReLU(+Add) (#62) · acbd7bd7
      Chao Liu authored
      * fix relu
      
      * clean up
      
      * clean up
      
      * adding 1x1 conv
      
      * adding 1x1 conv
      
      * added 1x1 conv
      
      * refactor
      
      * refactor
      
      * refactor
      
      * added profiler for conv+bias+relu+add
      
      * clean up
      
      * adding conv+bias+relu
      
      * adding conv+bias+relu
      
      * added conv+bias+relu
      
      * Update README.md
      
      * update cpu verification
      
      * adding c shuffle
      
      * update static_tensor for dealing with invalid element
      
      * adding c shuffle
      
      * debugging
      
      * fix bug
      
      * convert to fp16 before shuffle
      
      * shuffle more than one M/NRepeat
      
      * clean up
      
      * remove coordinate step hack from GridwiseGemm_k0mk1_k0nk1_mn_xdlops_v3r1
      
      * clean up
      
      * remove coordinate step hack from all gridwise gemm xdl
      
      * clean up coordinate step hack
      
      * clean up coordinate step hack
      
      * ThreadwiseTensorSliceTransfer_v3r2 support pointwise op on both src and dst
      
      * adding output shuffle in conv+bias+relu+add
      
      * update
      
      * added conv+bias+relu+add with c shuffle
      
      * added conv+bias+relu+add with c shuffle
      
      * fix forward_sweep bugs in threadwise copy
      
      * clean up
      
      * refactor
      
      * clean up
      
      * clean up
      
      * added conv_c_shuffle+bias_relu
      
      * clean up
      
      * added conv+bias+relu+atomic_add
      
      * clean up
      
      * clean up
      
      * clean up
      
      * clean up
      
      * clean up
      
      * clean up
      
      * misc fixes; add 1x1 specialization
      
      * clean up
      
      * delete unused device op
      
      * clean up
      
      * add support for odd C value
      acbd7bd7
  5. 24 Dec, 2021 1 commit
  6. 16 Dec, 2021 1 commit
  7. 14 Dec, 2021 1 commit
  8. 13 Dec, 2021 1 commit
    • Chao Liu's avatar
      manually apply bug fix changes in pr #63 (#64) · a4f24233
      Chao Liu authored
      * Bug in BlockwiseGemmXdlops_k0mk1_k0nk1_m0n0m1n1m2m3m4n2_v1::MakeCGridDescriptor_M0_N0_M1_N1_M2_M3_M4_N2()
      * Bug in ThreadwiseTensorSliceTransfer_v1r3 logic for calculating "forward_sweep"
      a4f24233
  9. 09 Dec, 2021 4 commits
  10. 04 Dec, 2021 1 commit
  11. 03 Dec, 2021 1 commit
    • Chao Liu's avatar
      GEMM/Conv+BiasAdd+ReLU+Add (#55) · 41cdd380
      Chao Liu authored
      * gemm+activation
      
      * move C pointwise operation into threadwise copy
      
      * add pointwise operation to A/B matrix
      
      * update ckProfiler
      
      * adding bias add
      
      * adding bias add
      
      * adding bias add
      
      * added bias add; worked around compiler issues
      
      * clean up
      
      * clean up
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * clean up
      
      * add conv_xdl example
      
      * adding conv_xdl_bias_relu_add example
      
      * add conv+bias+relu+add, but has register spill issue
      
      * tweak
      
      * tweak
      
      * refactor
      
      * Update README.md
      
      update readme for example/2_gemm_xdl_bias_relu_add
      
      * clean up
      
      * Update README.md
      
      update readme for example/3_conv_xdl
      
      * Update README.md
      41cdd380
  12. 02 Dec, 2021 5 commits
  13. 01 Dec, 2021 1 commit
  14. 30 Nov, 2021 2 commits
  15. 25 Nov, 2021 4 commits
  16. 24 Nov, 2021 3 commits
  17. 23 Nov, 2021 2 commits
  18. 22 Nov, 2021 1 commit
  19. 18 Nov, 2021 3 commits
    • Chao Liu's avatar
      Use __builtin_memcpy to implement bit_cast and for accessing vector from pointer of scalars (#53) · 64350aff
      Chao Liu authored
      * reworking vector_type
      
      * use __builtin_memcpy for bit_cast and vector access of scalar pointer
      
      * clean up
      64350aff
    • zjing14's avatar
      v5r1 fusion kernels for inference (#49) · 970fa3e9
      zjing14 authored
      
      
      * init
      
      * refactor for 1x1
      
      * rename e0_e1
      
      * add e1 with bugs
      
      * debug
      
      * fixed
      
      * fixed e1
      
      * add timer
      
      * imprve threadwise gemm with dot2
      
      * add e2
      
      * tuning
      
      * seperate c2
      
      * add nhwc
      
      * restore nchwc
      
      * clean
      
      * opt
      
      * fixed; tuning
      
      * add BGlobalMoveSliceWindowStepHacks{}
      
      * tuning
      
      * repeat running
      
      * adjust
      
      * merge v5r1 nchwc
      
      * add adaptors
      
      * split k0 k1 in c_thread_grid
      
      * split h and w
      
      * remove v5r1 nhwc
      
      * clean for pr
      
      * remove host_conv_add
      
      * clean code
      
      * clean
      
      * add dynamic support
      
      * static mode
      
      * test static
      
      * add conv+add fusion
      
      * fixed validation
      
      * naming fix
      
      * use activ_enum
      
      * make static
      
      * refactor conv_add for InMem::add
      
      * add bias
      
      * add conv_out
      
      * add configurable makeddesc
      
      * add maxpool fusion
      
      * add maxpool host for validation
      
      * enable static desc
      
      * conv-only use v5r1_add
      
      * test
      
      * test
      
      * for binary dumps
      
      * fixed incorrect results due to typo
      
      * clean
      
      * debugging maxpool
      
      * workaround with offset trick
      
      * clean code
      
      * modularize ops of fusion
      
      * add gridwise_gemm_v3
      
      * create seperate fusion fun
      
      * enable dynamic mode of conv and conv+resize_add
      
      * add dynamic mode of maxpool
      
      * add pass by point
      
      * add activ_type as arguments
      
      * merge develop
      
      * clean
      
      * reset config to old default
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      970fa3e9
    • zjing14's avatar
      Fixed bfp16 host_conv_fwd (#52) · a651ea4f
      zjing14 authored
      
      
      * fixed bfloat16 issues
      
      * refactor type_convert
      
      * fixed host_convolution_forward for ushort
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      a651ea4f
  20. 16 Nov, 2021 2 commits