"...composable_kernel_rocm.git" did not exist on "012d3a071b4f0260b53a15b9430695fa40521a07"
  1. 01 Jul, 2021 1 commit
    • zjing14's avatar
      xdlops_v4r4_fwd fp32/fp16 (#34) · 3835318c
      zjing14 authored
      
      
      * create files for xdlops
      
      * working on blockwise_gemm_xdlops
      
      * add KReduction
      
      * add m/n repeats
      
      * add 2x2 pipeline
      
      * added 128x128 wavegemm
      
      * use StaticBuffer of vector_type
      
      * break vector type to blk_size
      
      * add kpack into xldops_gemm and blockwise_gemm
      
      * abroadcast only
      
      * add fp32 mfma instructions
      
      * adding fp16 mfma
      
      * pack half4_t
      
      * rename kperwave to kpack
      
      * add 32x32x8fp16
      
      * add fp16 mfma
      
      * clean code
      
      * clean code
      
      * V4r4 xdlops kpack (#35)
      
      * add kpack with incorrect results
      
      * bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2
      
      * add 1x1 kernel
      
      * add gridwise_gemm_v2 - single_buffer
      
      * enabled dwordx4 for fp16
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      
      * refactor fwd-v4r4-xdlops
      
      * add v4r4-nhwc-xdlop
      
      * improve some perf of nhwc and nchw by tuning parameters, and change scheuduling in gridwise-gemm loop
      
      * tweak scheduling in gridwise gemm
      
      * add v4r3 with a single output copy
      
      * init commit: output with slice win
      
      * adding sliceWin
      
      * add multiple repeats pattern
      
      * starting adding bwd-v4r1-xdlops
      
      * use tuple as SrcBuffer
      
      * adding bwd-data v4r1 nhwc xdlops
      
      * fix bug in make_dynamic_naive_tensor_descriptor_aligned_v2()
      
      * fix bug in host bwd-data conv
      
      * initial implementation of bwd-data v4r1 nhwc xdlops
      
      * add launch bound flags
      
      * enable launch bound
      
      * add m/nrepeat=4
      
      * tweak bwd-data v4r1 nhwc xdlops
      
      * added bwd-data v4r1 nhwc xlops with output A and weight B
      
      * add fwd-v4r4 nhwc xdlops, A input, B weight, C output
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      3835318c
  2. 10 Jun, 2021 1 commit
  3. 11 May, 2021 1 commit
  4. 13 Apr, 2021 1 commit
  5. 25 Mar, 2021 1 commit
  6. 06 Aug, 2020 1 commit
    • Chao Liu's avatar
      Bwd Data NHWC (#22) · bbcb67d0
      Chao Liu authored
      * fix buffer_store bug
      * remove obsolete kernels
      * add bwd-data-v5r1-nhwc 
      bbcb67d0
  7. 24 Jun, 2020 1 commit
  8. 17 Feb, 2020 1 commit
  9. 27 Jan, 2020 1 commit
  10. 20 Jan, 2020 1 commit
    • Chao Liu's avatar
      Added bwd data v3r1 v4r1, tweaking v1 (#10) · c5da0377
      Chao Liu authored
      * Added bwd data v3r1: breaking down compute into a series of load balanced GEMM, and launch in a single kernel
      * Added bwd data v4r1: like v3r1, but launch GEMMs in multiple kernels
      * Tweaked v1r1  and v1r2 (atomic) on AMD GPU
      c5da0377
  11. 03 Dec, 2019 1 commit
    • Chao Liu's avatar
      backward data (#7) · 8f5f6496
      Chao Liu authored
      * enabled atomic add in tensor copy
      * added gridwise GEMM
      * added backward data conv using GEMM + atomic
      * added backward data conv using GEMM, no atomic
      8f5f6496
  12. 11 Oct, 2019 1 commit
  13. 27 Sep, 2019 4 commits
  14. 26 Sep, 2019 3 commits
  15. 25 Sep, 2019 3 commits
  16. 24 Sep, 2019 1 commit
  17. 22 Sep, 2019 2 commits
  18. 21 Sep, 2019 2 commits
  19. 18 Sep, 2019 2 commits
  20. 17 Sep, 2019 3 commits
  21. 15 Sep, 2019 2 commits
  22. 14 Sep, 2019 1 commit
  23. 12 Sep, 2019 2 commits
  24. 11 Sep, 2019 1 commit
  25. 10 Sep, 2019 1 commit
  26. 09 Sep, 2019 1 commit