1. 18 Jul, 2021 1 commit
  2. 05 Jul, 2021 1 commit
    • Chao Liu's avatar
      DL GEMM fp32/fp16/int8 (#41) · b8b2d0a6
      Chao Liu authored
      * add threadwise copy the copy a tensor in one copy, added kpack to DL GEMM
      
      * add kpack into fwd v4r5 nchw fp32
      b8b2d0a6
  3. 11 May, 2021 1 commit
  4. 24 Jun, 2020 1 commit
  5. 20 Jan, 2020 1 commit
    • Chao Liu's avatar
      Added bwd data v3r1 v4r1, tweaking v1 (#10) · c5da0377
      Chao Liu authored
      * Added bwd data v3r1: breaking down compute into a series of load balanced GEMM, and launch in a single kernel
      * Added bwd data v4r1: like v3r1, but launch GEMMs in multiple kernels
      * Tweaked v1r1  and v1r2 (atomic) on AMD GPU
      c5da0377
  6. 05 Jul, 2019 1 commit
  7. 13 Jun, 2019 1 commit
  8. 12 Jun, 2019 2 commits
  9. 11 Jun, 2019 1 commit
  10. 01 Apr, 2019 1 commit
  11. 15 Feb, 2019 3 commits
  12. 14 Feb, 2019 1 commit