1. 11 Oct, 2022 1 commit
    • charlie's avatar
      Redo design · d9d2215a
      charlie authored
      * doesn't make much sense to make broadcast use two inputs or handle
      dynamic shapes
      * compute the common shape for dynamic multibroadcast in the
      multibroadcast op
      * multibroadcast all combinations of the dynamic inputs
      d9d2215a
  2. 10 Oct, 2022 1 commit
  3. 03 Oct, 2022 1 commit
  4. 02 Oct, 2022 1 commit
  5. 30 Sep, 2022 3 commits
  6. 29 Sep, 2022 6 commits
  7. 28 Sep, 2022 3 commits
  8. 27 Sep, 2022 4 commits
  9. 26 Sep, 2022 4 commits
  10. 23 Sep, 2022 2 commits
  11. 22 Sep, 2022 2 commits
  12. 21 Sep, 2022 2 commits
  13. 19 Sep, 2022 1 commit
    • Paul Fultz II's avatar
      Improve layernorm and reductions performance (#1348) · 97a1ed2d
      Paul Fultz II authored
      Compute mean and variance in same reduction
      Set block size to numbers divisible by 32 instead powers of 2
      Global is also set exactly instead of being divisible by block size
      More exact matching of global/local can help get rid of branching/loops
      Reduce vectors first before doing dpp_reduce
      Explicitly vectorize array operators since the compiler doesnt always vectorize them
      Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported
      97a1ed2d
  14. 16 Sep, 2022 6 commits
  15. 15 Sep, 2022 2 commits
  16. 14 Sep, 2022 1 commit