1. 02 Jul, 2022 2 commits
  2. 01 Jul, 2022 1 commit
  3. 26 Jun, 2022 4 commits
  4. 25 Jun, 2022 3 commits
  5. 22 Jun, 2022 1 commit
  6. 10 Jun, 2022 1 commit
  7. 04 Jun, 2022 2 commits
  8. 31 May, 2022 1 commit
  9. 25 May, 2022 2 commits
  10. 24 May, 2022 1 commit
  11. 23 May, 2022 2 commits
  12. 20 May, 2022 2 commits
    • Paul's avatar
      Format · dc296a73
      Paul authored
      dc296a73
    • kahmed10's avatar
      Rename pointwise ops (#1145) · 4a312201
      kahmed10 authored
      For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.
      4a312201
  13. 19 May, 2022 1 commit
  14. 17 May, 2022 2 commits
  15. 12 May, 2022 1 commit
  16. 11 May, 2022 4 commits
  17. 10 May, 2022 2 commits
  18. 09 May, 2022 1 commit
  19. 06 May, 2022 1 commit
  20. 03 May, 2022 4 commits
  21. 29 Apr, 2022 1 commit
  22. 27 Apr, 2022 1 commit
    • Paul Fultz II's avatar
      Add lane reduction (#1180) · 4c72cc95
      Paul Fultz II authored
      With reductions such as {2048, 2, 1456} on axes 1, this is 23x faster than using our new block_reduce, and its even over 100x faster than our original reduce_sum:
      
      # lane
      gpu::code_object[code_object=13736,symbol_name=kernel,global=2981888,local=1024,]: 0.0672928ms
      # block
      gpu::code_object[code_object=13800,symbol_name=kernel,global=39321600,local=64,]: 1.46072ms
      # original
      gpu::reduce_sum[axes={1}]: 6.73456ms
      There is some basic logic to pick between lane and block reduce automatically.
      4c72cc95