1. 20 Jun, 2022 2 commits
  2. 18 Jun, 2022 1 commit
  3. 17 Jun, 2022 17 commits
  4. 16 Jun, 2022 2 commits
  5. 13 Jun, 2022 6 commits
  6. 10 Jun, 2022 1 commit
  7. 09 Jun, 2022 3 commits
  8. 07 Jun, 2022 1 commit
  9. 03 Jun, 2022 1 commit
    • Paul Fultz II's avatar
      Group code objects by kernel name in perf report summary (#1234) · 7271ddbc
      Paul Fultz II authored
      Break up the gpu::code_object  print to show the actual kernels...
      
      gpu::code_object::add_kernel: 0.646121ms, 5%
      gpu::code_object::mul_kernel: 0.623822ms, 5%
      gpu::code_object::add_mul_erf_add_mul_mul_kernel: 0.498902ms, 4%
      gpu::code_object::mul_add_kernel: 0.478352ms, 4%
      7271ddbc
  10. 02 Jun, 2022 2 commits
  11. 01 Jun, 2022 1 commit
  12. 31 May, 2022 1 commit
  13. 30 May, 2022 1 commit
    • shivadbhavsar's avatar
      Improve eliminate contiguous pass (#1223) · 86061b4d
      shivadbhavsar authored
      Following up on issue #1166 and PR #1220. Using the same approach as in #1220 for parallelizing the eval calls, we can significantly reduce the time spent on eliminate_contiguous pass.
      86061b4d
  14. 27 May, 2022 1 commit