"examples/vscode:/vscode.git/clone" did not exist on "bc3c73ad0b75ee550fdcce6e124d5a222834d6ed"
- 31 May, 2022 6 commits
-
-
umangyadav authored
-
umangyadav authored
-
umangyadav authored
-
umangyadav authored
-
umangyadav authored
-
umangyadav authored
-
- 30 May, 2022 1 commit
-
-
shivadbhavsar authored
Following up on issue #1166 and PR #1220. Using the same approach as in #1220 for parallelizing the eval calls, we can significantly reduce the time spent on eliminate_contiguous pass.
-
- 27 May, 2022 1 commit
-
-
Chris Austen authored
-
- 26 May, 2022 2 commits
-
-
shivadbhavsar authored
Addressing issue #1166 - propagate_constant pass currently uses a recursive approach to find all instructions in a module that can be evaluated to a literal and performs the replacement in the same call. New approach: Perform single pass though instructions in the module to determine which instructions can be evaluated Evaluate selected instructions in parallel Replace the selected instructions with the corresponding literal
-
Paul Fultz II authored
* Upgrade to cppcheck 2.8
-
- 25 May, 2022 2 commits
-
-
Chris Austen authored
raw is the download for the file, blob is the url for the github page.
-
dependabot[bot] authored
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.5.3 to 2.6.4. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/compare/v2.5.3...v2.6.4 ) --- updated-dependencies: - dependency-name: tensorflow dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by:
Chris Austen <causten@users.noreply.github.com>
-
- 24 May, 2022 4 commits
-
-
Paul Fultz II authored
* Improve applicable batched gemms for bert
-
Paul Fultz II authored
Remove std references in runtime compilation since these are not available when using hiprtc and the headers may not be available on the system
-
Paul Fultz II authored
* Fuse gemm add with pointwise fusions
-
shivadbhavsar authored
As described in #1196, the ONNX mean parser does not work correctly for integral types. This update fixes the issue by handling integral types separately, where summation is performed before division. Additional test cases have also been added for handling integral types.
-
- 20 May, 2022 2 commits
-
-
kahmed10 authored
For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.
-
Paul Fultz II authored
-
- 17 May, 2022 1 commit
-
-
shivadbhavsar authored
Updated variable names according to #1193
-
- 13 May, 2022 1 commit
-
-
Chris Austen authored
Our documentation indicates a user with sudo can run the install_prereqs.sh file. Turns out that the file is not complete enough to run on Ubuntu 18.04/20.04 independently. I updated the file to resolve the failures. resolves #1191
-
- 11 May, 2022 2 commits
-
-
Paul Fultz II authored
Fuse layernorm and added triadd_layernorm fusion. This is a prep performance booster
-
Chris Austen authored
ONNX Models changed from master to main. Changing path reflect the proper location
-
- 10 May, 2022 1 commit
-
-
Umang Yadav authored
Expose add_literal method in C/C++ api
-
- 09 May, 2022 1 commit
-
-
Paul Fultz II authored
Improves performance for add_gelu. In bert it is 4x faster and for mul_add it is 50% faster than what we current have.
-
- 06 May, 2022 2 commits
-
-
Chris Austen authored
Move to CI containers to rocm 5.0.2 upgrade to 20.04 free up some more file space in github action environments
-
Paul Fultz II authored
Add compile tests for gpu math functions
-
- 05 May, 2022 1 commit
-
-
Paul Fultz II authored
Fixes the #error when using cppcheck. This no longer suppresses cppcheck errors when including those errors. This fixes the cppcheck errors that was there already.
-
- 03 May, 2022 1 commit
-
-
Paul Fultz II authored
Helps avoid dangling references. This also deprecates the constructors that didnt take a lifetime annotation since its ambiguous the lifetime.
-
- 02 May, 2022 1 commit
-
-
Chris Austen authored
Release branch created for ROCm 5.2 so moving develop branch to 2.3
-
- 29 Apr, 2022 1 commit
-
-
turneram authored
Add ref and gpu implementations for ONNX op GatherND Resolves #1032
-
- 27 Apr, 2022 1 commit
-
-
Paul Fultz II authored
With reductions such as {2048, 2, 1456} on axes 1, this is 23x faster than using our new block_reduce, and its even over 100x faster than our original reduce_sum: # lane gpu::code_object[code_object=13736,symbol_name=kernel,global=2981888,local=1024,]: 0.0672928ms # block gpu::code_object[code_object=13800,symbol_name=kernel,global=39321600,local=64,]: 1.46072ms # original gpu::reduce_sum[axes={1}]: 6.73456ms There is some basic logic to pick between lane and block reduce automatically.
-
- 26 Apr, 2022 1 commit
-
-
Umang Yadav authored
* expose get_queue method
-
- 23 Apr, 2022 1 commit
-
-
Charlie Lin authored
Implements the ReverseSequence ONNX operator as a parser. This parser can only handle a constant sequence_lens input. This is the same as what is handled for TensorRT as far as I can tell. We could handle a variable sequence_lens input; that would require ref and GPU implementations of the operator. The ONNX backend tests are disabled because this does not handle variable sequence_lens.
-
- 19 Apr, 2022 1 commit
-
-
Charlie Lin authored
Refactored the reference implementation of pooling to something like what was done for roialign. Moved the reference implementation of pooling from targets/ref/lowering.cpp to pooling.hpp. Removed cpu_pooling, instead using reference pooling in pooling.hpp Added reference implementation of Lp Norm pooling and the global version Added tests for the Lp Norm Pooling
-
- 17 Apr, 2022 1 commit
-
-
Paul Fultz II authored
There is significant improvement on larger tensors with half almost 50% faster: lens: [1024, 384, 768] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms gpu::reduce_sum[axes={2}]: 1.73126ms Also for non-trivial layouts this can sometimes be over 2x faster: lens: [64, 1024, 768, 4] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms gpu::reduce_sum[axes={1}]: 2.63375ms Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR. Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
-
- 14 Apr, 2022 2 commits
-
-
bpickrel authored
Issue 1127 Updates the math.hpp header file to perform overloads of various standard functions (ops) for the hip half2 type. The half2 type is two 16-bit floats packed into a 32-bit number and therefore the overloads act on vectors of sizes that are multiples of 2. They are invoked in runtime compilation any time one of the ops is called on a tensor declared with the data type shape::half_type. Defined new template, made instances of the template for those math operations that the hip library contains, added verify tests for the sqrt operator for three cases: tensor size not divisible by 2 tensor size divisible by 2 but not by 4 tensor size divisible by 4
-
kahmed10 authored
update path for where file is located
-
- 13 Apr, 2022 1 commit
-
-
Paul Fultz II authored
also added the PYTHON_DISABLE_VERSIONS cmake variable to disable python versions.
-
- 12 Apr, 2022 2 commits
-
-
Paul Fultz II authored
out-of-bounds access when generate uses nonpacked tensors and add some additional asserts for gpu memory.
-
Shucai Xiao authored
ref implementation of the gemm op is sequential, this PR is to parallelize the gemm computation in the ref implementation.
-