- 20 May, 2022 7 commits
- 19 May, 2022 1 commit
-
-
Paul authored
-
- 18 May, 2022 1 commit
-
-
Paul authored
-
- 17 May, 2022 11 commits
- 12 May, 2022 3 commits
- 11 May, 2022 5 commits
-
-
Paul Fultz II authored
Fuse layernorm and added triadd_layernorm fusion. This is a prep performance booster
-
Paul authored
-
Paul authored
-
Paul authored
-
Paul authored
-
- 10 May, 2022 3 commits
-
-
Paul authored
-
Paul authored
-
Umang Yadav authored
Expose add_literal method in C/C++ api
-
- 09 May, 2022 1 commit
-
-
Paul Fultz II authored
Improves performance for add_gelu. In bert it is 4x faster and for mul_add it is 50% faster than what we current have.
-
- 06 May, 2022 1 commit
-
-
Chris Austen authored
Move to CI containers to rocm 5.0.2 upgrade to 20.04 free up some more file space in github action environments
-
- 05 May, 2022 1 commit
-
-
Paul Fultz II authored
Fixes the #error when using cppcheck. This no longer suppresses cppcheck errors when including those errors. This fixes the cppcheck errors that was there already.
-
- 03 May, 2022 1 commit
-
-
Paul Fultz II authored
Helps avoid dangling references. This also deprecates the constructors that didnt take a lifetime annotation since its ambiguous the lifetime.
-
- 29 Apr, 2022 1 commit
-
-
turneram authored
Add ref and gpu implementations for ONNX op GatherND Resolves #1032
-
- 27 Apr, 2022 1 commit
-
-
Paul Fultz II authored
With reductions such as {2048, 2, 1456} on axes 1, this is 23x faster than using our new block_reduce, and its even over 100x faster than our original reduce_sum: # lane gpu::code_object[code_object=13736,symbol_name=kernel,global=2981888,local=1024,]: 0.0672928ms # block gpu::code_object[code_object=13800,symbol_name=kernel,global=39321600,local=64,]: 1.46072ms # original gpu::reduce_sum[axes={1}]: 6.73456ms There is some basic logic to pick between lane and block reduce automatically.
-
- 26 Apr, 2022 1 commit
-
-
Umang Yadav authored
* expose get_queue method
-
- 23 Apr, 2022 1 commit
-
-
Charlie Lin authored
Implements the ReverseSequence ONNX operator as a parser. This parser can only handle a constant sequence_lens input. This is the same as what is handled for TensorRT as far as I can tell. We could handle a variable sequence_lens input; that would require ref and GPU implementations of the operator. The ONNX backend tests are disabled because this does not handle variable sequence_lens.
-
- 19 Apr, 2022 1 commit
-
-
Charlie Lin authored
Refactored the reference implementation of pooling to something like what was done for roialign. Moved the reference implementation of pooling from targets/ref/lowering.cpp to pooling.hpp. Removed cpu_pooling, instead using reference pooling in pooling.hpp Added reference implementation of Lp Norm pooling and the global version Added tests for the Lp Norm Pooling
-