1. 01 Jul, 2021 1 commit
    • zjing14's avatar
      xdlops_v4r4_fwd fp32/fp16 (#34) · 3835318c
      zjing14 authored
      
      
      * create files for xdlops
      
      * working on blockwise_gemm_xdlops
      
      * add KReduction
      
      * add m/n repeats
      
      * add 2x2 pipeline
      
      * added 128x128 wavegemm
      
      * use StaticBuffer of vector_type
      
      * break vector type to blk_size
      
      * add kpack into xldops_gemm and blockwise_gemm
      
      * abroadcast only
      
      * add fp32 mfma instructions
      
      * adding fp16 mfma
      
      * pack half4_t
      
      * rename kperwave to kpack
      
      * add 32x32x8fp16
      
      * add fp16 mfma
      
      * clean code
      
      * clean code
      
      * V4r4 xdlops kpack (#35)
      
      * add kpack with incorrect results
      
      * bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2
      
      * add 1x1 kernel
      
      * add gridwise_gemm_v2 - single_buffer
      
      * enabled dwordx4 for fp16
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      
      * refactor fwd-v4r4-xdlops
      
      * add v4r4-nhwc-xdlop
      
      * improve some perf of nhwc and nchw by tuning parameters, and change scheuduling in gridwise-gemm loop
      
      * tweak scheduling in gridwise gemm
      
      * add v4r3 with a single output copy
      
      * init commit: output with slice win
      
      * adding sliceWin
      
      * add multiple repeats pattern
      
      * starting adding bwd-v4r1-xdlops
      
      * use tuple as SrcBuffer
      
      * adding bwd-data v4r1 nhwc xdlops
      
      * fix bug in make_dynamic_naive_tensor_descriptor_aligned_v2()
      
      * fix bug in host bwd-data conv
      
      * initial implementation of bwd-data v4r1 nhwc xdlops
      
      * add launch bound flags
      
      * enable launch bound
      
      * add m/nrepeat=4
      
      * tweak bwd-data v4r1 nhwc xdlops
      
      * added bwd-data v4r1 nhwc xlops with output A and weight B
      
      * add fwd-v4r4 nhwc xdlops, A input, B weight, C output
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      3835318c
  2. 10 Jun, 2021 1 commit
  3. 11 May, 2021 1 commit
  4. 25 Mar, 2021 1 commit
  5. 24 Jun, 2020 1 commit
  6. 17 Feb, 2020 1 commit
  7. 27 Jan, 2020 1 commit
  8. 20 Jan, 2020 1 commit
    • Chao Liu's avatar
      Added bwd data v3r1 v4r1, tweaking v1 (#10) · c5da0377
      Chao Liu authored
      * Added bwd data v3r1: breaking down compute into a series of load balanced GEMM, and launch in a single kernel
      * Added bwd data v4r1: like v3r1, but launch GEMMs in multiple kernels
      * Tweaked v1r1  and v1r2 (atomic) on AMD GPU
      c5da0377
  9. 03 Dec, 2019 1 commit
    • Chao Liu's avatar
      backward data (#7) · 8f5f6496
      Chao Liu authored
      * enabled atomic add in tensor copy
      * added gridwise GEMM
      * added backward data conv using GEMM + atomic
      * added backward data conv using GEMM, no atomic
      8f5f6496
  10. 10 Sep, 2019 1 commit
  11. 09 Sep, 2019 1 commit
  12. 05 Sep, 2019 1 commit
  13. 20 Jun, 2019 1 commit
  14. 19 Jun, 2019 2 commits
  15. 18 Jun, 2019 2 commits
  16. 17 Jun, 2019 2 commits
  17. 13 Jun, 2019 1 commit
  18. 12 Jun, 2019 1 commit
  19. 11 Jun, 2019 2 commits
  20. 07 Jun, 2019 1 commit
  21. 06 Jun, 2019 1 commit
  22. 05 Jun, 2019 1 commit
  23. 03 Jun, 2019 1 commit
  24. 30 May, 2019 1 commit
  25. 23 May, 2019 1 commit
  26. 16 May, 2019 1 commit
  27. 02 May, 2019 1 commit
  28. 23 Apr, 2019 1 commit
  29. 18 Apr, 2019 1 commit
  30. 06 Apr, 2019 2 commits
  31. 03 Apr, 2019 2 commits
  32. 01 Apr, 2019 1 commit
  33. 29 Mar, 2019 1 commit
  34. 24 Mar, 2019 1 commit