- 23 Feb, 2023 1 commit
-
-
charlie authored
* changed split_single_dyn_dim to add a get_tuple_element instruction needed to access tuple type created by select_module * dyn_test_runner changed to have offload_copy to false * split_single_dyn_dim is not going to work with offload_copy unless we make `load`, `copy_to_gpu` and `copy_from_gpu` handle dynamic shapes
-
- 21 Feb, 2023 4 commits
- 20 Feb, 2023 1 commit
-
-
charlie authored
-
- 17 Feb, 2023 3 commits
-
-
charlie authored
-
charlie authored
-
Chris Austen authored
Enable python 3.10 bindings
-
- 16 Feb, 2023 6 commits
-
-
charlie authored
-
Paul Fultz II authored
Avoids double global loads. Strided loops are unrolled which lets store results in array which compiler will use registers for since the index access is constant. Updated to handle large reductions so which results with a better stable diffusion result
-
charlie authored
-
charlie authored
-
Umang Yadav authored
* deprecate HCC
-
Umang Yadav authored
* Add driver flag "--exhaustive-tune" to enable tuning, add support for the same in C/C++ and python API
-
- 15 Feb, 2023 9 commits
-
-
Brian Pickrell authored
Add dynamic shape support to slice operator. First draft of this feature doesn't support ops slicing non-fixed, dynamic axes. Resulting shape in such cases is not guaranteed.* Also, onnx parsing doesn't support any arguments other than "axes".
-
charlie authored
-
charlie authored
removed the new rank<1> function as it messed up batch_quant_dot_5 verify test
-
charlie authored
-
charlie authored
-
charlie authored
-
charlie authored
-
charlie authored
-
Ted Themistokleous authored
Use requirements-dev.txt instead of the other instances
-
- 14 Feb, 2023 4 commits
-
-
charlie authored
* Changed the allocates to occur in the submodules * Incomplete, as the use_local_alloc variable in module does not work properly * added a hip::sync_stream before the return * not sure why the hip::sync_stream gets rid of the dangling reference error (code-wise it's because hip::sync_stream's output alias is -1)
-
shivadbhavsar authored
Currently, we default to device 0 when loading programs. Updating this to use hipGetDevice to set the device for the loaded program.
-
Charlie Lin authored
Expands on the documentation and corrects default option documentation error.
-
Paul Fultz II authored
* Add serialization of tuples and optional types
-
- 13 Feb, 2023 1 commit
-
-
kahmed10 authored
Using add_instruction for the neg op was causing issues on replace_instruction. Changed to use insert_instruction. Tests and added a new one that is failing without the change.
-
- 11 Feb, 2023 1 commit
-
-
Brian Pickrell authored
* add dynamic shape support to concat operator. Includes new op_shape_test and ref_ops_test cases
-
- 10 Feb, 2023 3 commits
-
-
Brian Pickrell authored
dyn shape support for Where operator. Includes shape test, ref_ops test, onx_test.
-
charlie authored
-
Umang Yadav authored
-
- 09 Feb, 2023 1 commit
-
-
Chris Austen authored
Change download location for onnx model from AWS ONNX to download.onnxruntime.ai
-
- 08 Feb, 2023 1 commit
-
-
charlie authored
-
- 06 Feb, 2023 2 commits
-
-
charlie authored
-
Paul Fultz II authored
* Fuse layernorm with different patterns * Only match when using the last axis Co-authored-by:
kahmed10 <15948690+kahmed10@users.noreply.github.com> Co-authored-by:
kahmed10 <15948690+kahmed10@users.noreply.github.com>
-
- 04 Feb, 2023 1 commit
-
-
charlie authored
-
- 03 Feb, 2023 2 commits
-
-
Chris Austen authored
Switch default ROCm version to 5.4.2 and default calculation algorithm to defined threshold
-
Paul Fultz II authored
Refactors memory coloring to only handle allocation instructions. It also handles allocations for tuple shapes.
-