"src/vscode:/vscode.git/clone" did not exist on "3272b22e95ac6b1b4b5fe48aa8fe32f0dd1e912f"
- 24 May, 2022 1 commit
-
-
charlie authored
-
- 20 May, 2022 3 commits
-
-
charlie authored
-
kahmed10 authored
For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.
-
Paul Fultz II authored
-
- 19 May, 2022 2 commits
- 17 May, 2022 1 commit
-
-
shivadbhavsar authored
Updated variable names according to #1193
-
- 13 May, 2022 1 commit
-
-
Chris Austen authored
Our documentation indicates a user with sudo can run the install_prereqs.sh file. Turns out that the file is not complete enough to run on Ubuntu 18.04/20.04 independently. I updated the file to resolve the failures. resolves #1191
-
- 11 May, 2022 9 commits
-
-
charlie authored
-
charlie authored
-
Paul Fultz II authored
Fuse layernorm and added triadd_layernorm fusion. This is a prep performance booster
-
charlie authored
-
charlie authored
-
charlie authored
-
charlie authored
-
charlie authored
-
Chris Austen authored
ONNX Models changed from master to main. Changing path reflect the proper location
-
- 10 May, 2022 3 commits
-
-
charlie authored
Reverts the dyn_data struct change Should get around the ambiguous braced initialization list error
-
charlie authored
-
Umang Yadav authored
Expose add_literal method in C/C++ api
-
- 09 May, 2022 3 commits
-
-
charlie authored
-
charlie authored
-
Paul Fultz II authored
Improves performance for add_gelu. In bert it is 4x faster and for mul_add it is 50% faster than what we current have.
-
- 06 May, 2022 5 commits
-
-
charlie authored
-
charlie authored
-
charlie authored
-
Chris Austen authored
Move to CI containers to rocm 5.0.2 upgrade to 20.04 free up some more file space in github action environments
-
Paul Fultz II authored
Add compile tests for gpu math functions
-
- 05 May, 2022 2 commits
-
-
Paul Fultz II authored
Fixes the #error when using cppcheck. This no longer suppresses cppcheck errors when including those errors. This fixes the cppcheck errors that was there already.
-
charlie authored
-
- 04 May, 2022 3 commits
- 03 May, 2022 3 commits
-
-
charlie authored
-
charlie authored
-
Paul Fultz II authored
Helps avoid dangling references. This also deprecates the constructors that didnt take a lifetime annotation since its ambiguous the lifetime.
-
- 02 May, 2022 1 commit
-
-
Chris Austen authored
Release branch created for ROCm 5.2 so moving develop branch to 2.3
-
- 29 Apr, 2022 1 commit
-
-
turneram authored
Add ref and gpu implementations for ONNX op GatherND Resolves #1032
-
- 27 Apr, 2022 1 commit
-
-
Paul Fultz II authored
With reductions such as {2048, 2, 1456} on axes 1, this is 23x faster than using our new block_reduce, and its even over 100x faster than our original reduce_sum: # lane gpu::code_object[code_object=13736,symbol_name=kernel,global=2981888,local=1024,]: 0.0672928ms # block gpu::code_object[code_object=13800,symbol_name=kernel,global=39321600,local=64,]: 1.46072ms # original gpu::reduce_sum[axes={1}]: 6.73456ms There is some basic logic to pick between lane and block reduce automatically.
-
- 26 Apr, 2022 1 commit
-
-
Umang Yadav authored
* expose get_queue method
-