- 07 Feb, 2022 1 commit
-
-
Paul authored
-
- 05 Feb, 2022 2 commits
- 02 Feb, 2022 1 commit
-
-
Paul Fultz II authored
Currently, MIGRAPHX_TRACE_EVAL=2 prints out the entire output buffer, but this can produce a lot of output. To make it easier to inspect and debug, using MIGRAPHX_TRACE_EVAL=2 now only prints 10 elements from the buffer(the first 5 and last 5) and shows any fp classifications found in the buffer(ie nans, infinity, etc). The previous behavior can still be enabled with MIGRAPHX_TRACE_EVAL=3.
-
- 31 Jan, 2022 1 commit
-
-
Shucai Xiao authored
* use the parse_resize to parse the upsample operator
-
- 28 Jan, 2022 2 commits
-
-
Paul Fultz II authored
* Enable auto vectorization * Handle vector types with convert function * Dont vectorize when it will cause problems with preload
-
turneram authored
* Add mean op onnx parser and unit tests * Refactor parse_mean to use add_broadcastable_binary_op
-
- 27 Jan, 2022 1 commit
-
-
Umang Yadav authored
allow nonstd shape for the arg ops, non-standard shapes include broadcast, slice and transpose
-
- 26 Jan, 2022 2 commits
- 21 Jan, 2022 4 commits
-
-
turneram authored
Add onnx parser for operator GreaterOrEqual
-
turneram authored
Add onnx parser and unit tests for Softsign
-
turneram authored
* Add onnx parser and unit test
-
Paul Fultz II authored
* Improve handling of generator expressions when getting the flags for hip
-
- 17 Jan, 2022 1 commit
-
-
Paul Fultz II authored
Make clip a pointwise op
-
- 11 Jan, 2022 1 commit
-
-
turneram authored
Add HardSigmoid onnx parser and unit tests Produces mathematical equivalent to ONNX operator through combination of existing pointwise ops. Resolves #1028
-
- 10 Jan, 2022 3 commits
-
-
Paul Fultz II authored
* Add matcher for conv_bias pointwise * Add fusion op
-
Paul authored
-
Paul authored
-
- 07 Jan, 2022 2 commits
- 06 Jan, 2022 3 commits
- 05 Jan, 2022 1 commit
-
-
turneram authored
Fix bug caused by casting time seed to float
-
- 11 Dec, 2021 7 commits
- 09 Dec, 2021 2 commits
-
-
Shucai Xiao authored
Changed the number of threads in a block from 256 to 128 Increased the max number of blocks in the kernel from 256 to 1M. For the case that the axis is the last dimension, we removed the computation of index since it is not required. With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.
-
Paul Fultz II authored
Fuse last instruction in fuse_pointwise This is also fixes a bug with using an invalid iterator.
-
- 08 Dec, 2021 1 commit
-
-
Paul Fultz II authored
-
- 07 Dec, 2021 1 commit
-
-
Paul Fultz II authored
simple variable rename
-
- 02 Dec, 2021 1 commit
-
-
Paul Fultz II authored
Fix pointwise compile error with half sqrt
-
- 01 Dec, 2021 3 commits