- 31 May, 2022 4 commits
-
-
turneram authored
-
turneram authored
-
dependabot[bot] authored
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.6.4 to 2.7.2. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/compare/v2.6.4...v2.7.2 ) --- updated-dependencies: - dependency-name: tensorflow dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by:
Chris Austen <causten@users.noreply.github.com>
-
turneram authored
-
- 30 May, 2022 1 commit
-
-
shivadbhavsar authored
Following up on issue #1166 and PR #1220. Using the same approach as in #1220 for parallelizing the eval calls, we can significantly reduce the time spent on eliminate_contiguous pass.
-
- 27 May, 2022 1 commit
-
-
Chris Austen authored
-
- 26 May, 2022 5 commits
-
-
shivadbhavsar authored
Addressing issue #1166 - propagate_constant pass currently uses a recursive approach to find all instructions in a module that can be evaluated to a literal and performs the replacement in the same call. New approach: Perform single pass though instructions in the module to determine which instructions can be evaluated Evaluate selected instructions in parallel Replace the selected instructions with the corresponding literal
-
Paul Fultz II authored
* Upgrade to cppcheck 2.8
-
turneram authored
-
turneram authored
-
turneram authored
-
- 25 May, 2022 2 commits
-
-
Chris Austen authored
raw is the download for the file, blob is the url for the github page.
-
dependabot[bot] authored
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.5.3 to 2.6.4. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/compare/v2.5.3...v2.6.4 ) --- updated-dependencies: - dependency-name: tensorflow dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by:
Chris Austen <causten@users.noreply.github.com>
-
- 24 May, 2022 4 commits
-
-
Paul Fultz II authored
* Improve applicable batched gemms for bert
-
Paul Fultz II authored
Remove std references in runtime compilation since these are not available when using hiprtc and the headers may not be available on the system
-
Paul Fultz II authored
* Fuse gemm add with pointwise fusions
-
shivadbhavsar authored
As described in #1196, the ONNX mean parser does not work correctly for integral types. This update fixes the issue by handling integral types separately, where summation is performed before division. Additional test cases have also been added for handling integral types.
-
- 20 May, 2022 17 commits
-
-
turneram authored
-
turneram authored
-
turneram authored
-
turneram authored
-
turneram authored
-
turneram authored
-
turneram authored
-
turneram authored
-
turneram authored
-
turneram authored
-
turneram authored
-
turneram authored
-
turneram authored
-
turneram authored
-
turneram authored
-
kahmed10 authored
For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.
-
Paul Fultz II authored
-
- 17 May, 2022 1 commit
-
-
shivadbhavsar authored
Updated variable names according to #1193
-
- 13 May, 2022 1 commit
-
-
Chris Austen authored
Our documentation indicates a user with sudo can run the install_prereqs.sh file. Turns out that the file is not complete enough to run on Ubuntu 18.04/20.04 independently. I updated the file to resolve the failures. resolves #1191
-
- 11 May, 2022 2 commits
-
-
Paul Fultz II authored
Fuse layernorm and added triadd_layernorm fusion. This is a prep performance booster
-
Chris Austen authored
ONNX Models changed from master to main. Changing path reflect the proper location
-
- 10 May, 2022 1 commit
-
-
Umang Yadav authored
Expose add_literal method in C/C++ api
-
- 09 May, 2022 1 commit
-
-
Paul Fultz II authored
Improves performance for add_gelu. In bert it is 4x faster and for mul_add it is 50% faster than what we current have.
-