- 22 Sep, 2022 2 commits
- 21 Sep, 2022 1 commit
-
-
kahmed10 authored
This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.
-
- 19 Sep, 2022 1 commit
-
-
Paul Fultz II authored
Compute mean and variance in same reduction Set block size to numbers divisible by 32 instead powers of 2 Global is also set exactly instead of being divisible by block size More exact matching of global/local can help get rid of branching/loops Reduce vectors first before doing dpp_reduce Explicitly vectorize array operators since the compiler doesnt always vectorize them Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported
-
- 16 Sep, 2022 4 commits
-
-
Umang Yadav authored
* fix typo for add_sigmoid
-
turneram authored
-
turneram authored
-
turneram authored
-
- 15 Sep, 2022 1 commit
-
-
Lixun Zhang authored
* Replaced `find_library` with `find_package` to locate MLIR static library * Unified the include dir for headers and remove backward compatibility * Embedded the external/include dir into the exported library
-
- 14 Sep, 2022 2 commits
-
-
Paul authored
-
Paul Fultz II authored
* Implement concat using jit compilation
-
- 13 Sep, 2022 6 commits
-
-
turneram authored
-
turneram authored
-
turneram authored
Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension. The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes. Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.
-
turneram authored
-
turneram authored
-
turneram authored
-
- 12 Sep, 2022 4 commits
- 09 Sep, 2022 2 commits
- 08 Sep, 2022 3 commits
-
-
Paul Fultz II authored
* Remove unused headers
-
turneram authored
-
turneram authored
-
- 07 Sep, 2022 6 commits
- 06 Sep, 2022 1 commit
-
-
Paul Fultz II authored
Using not and or improves readability. The cppcheck rule will help ensure we are doing it consistently.
-
- 31 Aug, 2022 1 commit
-
-
turneram authored
Rewrite_gelu pass replaces the gelu formula of x * (1/2) * (1 + erf(x/sqrt(2))) with the sigmoid approximation of x * Sigmoid(x * 1.702)
-
- 27 Aug, 2022 2 commits
-
-
Paul Fultz II authored
* Track kernel time
-
Paul Fultz II authored
This will rewrite dot operators like X(Y + b) to XY + Xb when b is constant as we can fold the add away. This improves handling pointwise with broadcasted operators, this helps improves const propagation. Improve gemm fusion with a mul_add Improve support for broadcast shapes in gemm
-
- 26 Aug, 2022 2 commits
- 25 Aug, 2022 2 commits