- 25 Jun, 2022 1 commit
-
-
Paul Fultz II authored
* Jit contiguous
-
- 24 Jun, 2022 2 commits
-
-
Ted Themistokleous authored
Used to determine what files contain a license and are stamped. If not we exit and return an error code that can be later ingested by another script, as well as a list of the outstanding files in questions. Currently baked in the list of files we should support or not support with licenses in them a well as some stuff to quickly ignore
-
Umang Yadav authored
Adds compute_method for the experimental custom ops. Adds a test for the same using HIP APIs. Depends on #1183 Solves #1101
-
- 22 Jun, 2022 1 commit
-
-
Ted Themistokleous authored
Updated each source file in the repo with the existing license.
-
- 17 Jun, 2022 2 commits
-
-
Ted Themistokleous authored
* [#935] Update tf_parser to have add_common_op() for parse_relu6 Similar to that of the onnx_parser.cpp add a add_common_op template and functionality to support clip based operations. This is done so clip operations can be guarenteed to have the same dimensions. * fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6 * fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6 * fixup! fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6 * fixup! fixup! fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6 * Formatting * fixup! Formatting Co-authored-by:
Umang Yadav <29876643+umangyadav@users.noreply.github.com> Co-authored-by:
Paul Fultz II <pfultz2@yahoo.com>
-
kahmed10 authored
* add allocate op header * formatting * add replace_allocate pass * formatting * move output param to remove_allocate pass * formatting * fix bugs in replace_allocate pass * formatting * fix verify if tests * formatting * move if op logic * formatting * cleanup lowering * cleanup lowering * formatting * fix tidy * formatting * fix tidy * add cpu allocate check * formatting * change cpu allocate in pass * formatting * add some tests for replace_allocate pass * formatting * pass by ref * fix run_pass * formatting * update variable name for module * update dce to use contains() and fix tidy * formatting * update cppcheck * add if test * formatting * add if test * rename var to mod_output_names * formatting * remove conditional * update allocate op and tests * formatting * update replace_allocate tests * update create_output_names() and conditional in replace_allocate * formatting * remove extra variable in replace_allocate * update tools script for allocation_model Co-authored-by:
Umang Yadav <29876643+umangyadav@users.noreply.github.com> Co-authored-by:
Chris Austen <causten@users.noreply.github.com> Co-authored-by:
Paul Fultz II <pfultz2@yahoo.com>
-
- 16 Jun, 2022 1 commit
-
-
Charlie Lin authored
* Use custom distance function * Pass module, skip order check if other module * Change other valid() * Remove unnecessary declaration * test multiple module dependency * Refactor to make more clear * Code cleanup * Simplify fix * Test EXPECT Co-authored-by:Paul Fultz II <pfultz2@yahoo.com>
-
- 07 Jun, 2022 1 commit
-
-
Zhuoran Yin authored
prioritizing int8 over int8x4 when it is applicable Amend return to continue in apply loop Adding error handling in case int8x4 compilation failed Co-authored-by:Paul Fultz II <pfultz2@yahoo.com>
-
- 02 Jun, 2022 1 commit
-
-
Paul Fultz II authored
-
- 26 May, 2022 1 commit
-
-
Paul Fultz II authored
* Upgrade to cppcheck 2.8
-
- 24 May, 2022 2 commits
-
-
Paul Fultz II authored
* Improve applicable batched gemms for bert
-
shivadbhavsar authored
As described in #1196, the ONNX mean parser does not work correctly for integral types. This update fixes the issue by handling integral types separately, where summation is performed before division. Additional test cases have also been added for handling integral types.
-
- 11 May, 2022 1 commit
-
-
Paul Fultz II authored
Fuse layernorm and added triadd_layernorm fusion. This is a prep performance booster
-
- 10 May, 2022 1 commit
-
-
Umang Yadav authored
Expose add_literal method in C/C++ api
-
- 06 May, 2022 1 commit
-
-
Paul Fultz II authored
Add compile tests for gpu math functions
-
- 03 May, 2022 1 commit
-
-
Paul Fultz II authored
Helps avoid dangling references. This also deprecates the constructors that didnt take a lifetime annotation since its ambiguous the lifetime.
-
- 29 Apr, 2022 1 commit
-
-
turneram authored
Add ref and gpu implementations for ONNX op GatherND Resolves #1032
-
- 26 Apr, 2022 1 commit
-
-
Umang Yadav authored
* expose get_queue method
-
- 23 Apr, 2022 1 commit
-
-
Charlie Lin authored
Implements the ReverseSequence ONNX operator as a parser. This parser can only handle a constant sequence_lens input. This is the same as what is handled for TensorRT as far as I can tell. We could handle a variable sequence_lens input; that would require ref and GPU implementations of the operator. The ONNX backend tests are disabled because this does not handle variable sequence_lens.
-
- 19 Apr, 2022 1 commit
-
-
Charlie Lin authored
Refactored the reference implementation of pooling to something like what was done for roialign. Moved the reference implementation of pooling from targets/ref/lowering.cpp to pooling.hpp. Removed cpu_pooling, instead using reference pooling in pooling.hpp Added reference implementation of Lp Norm pooling and the global version Added tests for the Lp Norm Pooling
-
- 17 Apr, 2022 1 commit
-
-
Paul Fultz II authored
There is significant improvement on larger tensors with half almost 50% faster: lens: [1024, 384, 768] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms gpu::reduce_sum[axes={2}]: 1.73126ms Also for non-trivial layouts this can sometimes be over 2x faster: lens: [64, 1024, 768, 4] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms gpu::reduce_sum[axes={1}]: 2.63375ms Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR. Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
-
- 14 Apr, 2022 1 commit
-
-
bpickrel authored
Issue 1127 Updates the math.hpp header file to perform overloads of various standard functions (ops) for the hip half2 type. The half2 type is two 16-bit floats packed into a 32-bit number and therefore the overloads act on vectors of sizes that are multiples of 2. They are invoked in runtime compilation any time one of the ops is called on a tensor declared with the data type shape::half_type. Defined new template, made instances of the template for those math operations that the hip library contains, added verify tests for the sqrt operator for three cases: tensor size not divisible by 2 tensor size divisible by 2 but not by 4 tensor size divisible by 4
-
- 11 Apr, 2022 1 commit
-
-
bpickrel authored
Change the "scatter" struct and op to a base/child set of three: scatter_none, scatter_add, scatter_mul to mirror Onnx' ScatterElements op. and its three reduction options. (Onnx Scatter op is deprecated and is equivalent to scatter_none.) Provides both a reference op. and update to Onnx parsing. Tests updated and new test case added.
-
- 08 Apr, 2022 1 commit
-
-
Paul Fultz II authored
* Fix comparisons in migraphx::value class
-
- 06 Apr, 2022 1 commit
-
-
Umang Yadav authored
Adds following API binding and tests to python : add_return add_instruction add_parameter create_module.
-
- 01 Apr, 2022 1 commit
-
-
Charlie Lin authored
* Fix and change doc CMakeLists 1. Fix include directory location with hange from #1088 2. Create a DoxygenWarningLog.txt file in <build_dir>/doc/doxygen 3. Move compiled html or pdf files to <build_dir>/doc/[pdf, html]
-
- 31 Mar, 2022 1 commit
-
-
Umang Yadav authored
Documentation update for valid targets
-
- 29 Mar, 2022 2 commits
-
-
Umang Yadav authored
Follow up to #1128
-
Paul Fultz II authored
This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape. This also makes it easier to add new runtime compiled kernels in the future.
-
- 25 Mar, 2022 1 commit
-
-
Paul Fultz II authored
* Handle string literal in construction * Improve get_default with vector
-
- 24 Mar, 2022 1 commit
-
-
Paul Fultz II authored
This creates a custom op which has name() and compute_shape() methods.
-
- 21 Mar, 2022 1 commit
-
-
Charlie Lin authored
* LpNormalization ONNX parser
-
- 18 Mar, 2022 2 commits
-
-
turneram authored
Add exclusive and reverse modes to gpu implementation of prefix_scan_sum, which completes support for ONNX op CumSum
-
Paul Fultz II authored
The get_context may change in the future(when we support multi-targets) so make this experimental for now.
-
- 15 Mar, 2022 1 commit
-
-
Umang Yadav authored
API includes following create_module, get_main_module add_instruction without module args add_instruction with module args add_parameter add_return
-
- 11 Mar, 2022 1 commit
-
-
Shucai Xiao authored
The module::debug_print(ins) is very slow, which makes the trave_eval==1/2 very slow. The reason is printing an ins involves search the whole module to get the instruction, the print it. This change is to fix that by calling module::print() to get names of all instructions of a program, then print the instruction by getting its name from a hash map.
-
- 09 Mar, 2022 3 commits
-
-
Charlie Lin authored
Add Celu ONNX operator
-
Paul Fultz II authored
Add python API to construct shape class
-
kahmed10 authored
Add a callable C++ API to migraphx
-
- 08 Mar, 2022 1 commit
-
-
Charlie Lin authored
* Implement size ONNX operator and tests
-