- 19 Sep, 2022 1 commit
-
-
Paul Fultz II authored
Compute mean and variance in same reduction Set block size to numbers divisible by 32 instead powers of 2 Global is also set exactly instead of being divisible by block size More exact matching of global/local can help get rid of branching/loops Reduce vectors first before doing dpp_reduce Explicitly vectorize array operators since the compiler doesnt always vectorize them Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported
-
- 18 Sep, 2022 2 commits
- 17 Sep, 2022 4 commits
- 16 Sep, 2022 2 commits
-
-
Umang Yadav authored
* fix typo for add_sigmoid
-
Umang Yadav authored
* remove deprecated constructor
-
- 15 Sep, 2022 6 commits
-
-
Lixun Zhang authored
* Replaced `find_library` with `find_package` to locate MLIR static library * Unified the include dir for headers and remove backward compatibility * Embedded the external/include dir into the exported library
-
Paul authored
-
Paul authored
-
Paul authored
-
Paul authored
-
Paul authored
-
- 14 Sep, 2022 3 commits
-
-
Umang Yadav authored
* fix slice_dim1 for case
-
Paul Fultz II authored
* Implement concat using jit compilation
-
shivadbhavsar authored
expose underlying p data inter for migraphx.argument Update python api documentation
-
- 13 Sep, 2022 3 commits
-
-
turneram authored
Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension. The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes. Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.
-
Paul authored
-
Paul authored
-
- 12 Sep, 2022 5 commits
- 10 Sep, 2022 2 commits
- 09 Sep, 2022 6 commits
- 08 Sep, 2022 2 commits
-
-
Paul Fultz II authored
* Remove unused headers
-
Charlie Lin authored
Fixes TF literal parsing for relu6. previously always made a float type literal, breaks for float16 as an example
-
- 07 Sep, 2022 4 commits