- 07 Jul, 2022 1 commit
-
-
Brian Pickrell authored
One-line fix to register the op miopen_fusion. This error was causing loading of compiled model files (*.mxr) to fail.
-
- 29 Apr, 2022 1 commit
-
-
turneram authored
Add ref and gpu implementations for ONNX op GatherND Resolves #1032
-
- 27 Apr, 2022 1 commit
-
-
Paul Fultz II authored
With reductions such as {2048, 2, 1456} on axes 1, this is 23x faster than using our new block_reduce, and its even over 100x faster than our original reduce_sum: # lane gpu::code_object[code_object=13736,symbol_name=kernel,global=2981888,local=1024,]: 0.0672928ms # block gpu::code_object[code_object=13800,symbol_name=kernel,global=39321600,local=64,]: 1.46072ms # original gpu::reduce_sum[axes={1}]: 6.73456ms There is some basic logic to pick between lane and block reduce automatically.
-
- 26 Apr, 2022 1 commit
-
-
Umang Yadav authored
* expose get_queue method
-
- 23 Apr, 2022 1 commit
-
-
Charlie Lin authored
Implements the ReverseSequence ONNX operator as a parser. This parser can only handle a constant sequence_lens input. This is the same as what is handled for TensorRT as far as I can tell. We could handle a variable sequence_lens input; that would require ref and GPU implementations of the operator. The ONNX backend tests are disabled because this does not handle variable sequence_lens.
-
- 19 Apr, 2022 1 commit
-
-
Charlie Lin authored
Refactored the reference implementation of pooling to something like what was done for roialign. Moved the reference implementation of pooling from targets/ref/lowering.cpp to pooling.hpp. Removed cpu_pooling, instead using reference pooling in pooling.hpp Added reference implementation of Lp Norm pooling and the global version Added tests for the Lp Norm Pooling
-
- 17 Apr, 2022 1 commit
-
-
Paul Fultz II authored
There is significant improvement on larger tensors with half almost 50% faster: lens: [1024, 384, 768] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms gpu::reduce_sum[axes={2}]: 1.73126ms Also for non-trivial layouts this can sometimes be over 2x faster: lens: [64, 1024, 768, 4] gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms gpu::reduce_sum[axes={1}]: 2.63375ms Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR. Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
-
- 14 Apr, 2022 1 commit
-
-
bpickrel authored
Issue 1127 Updates the math.hpp header file to perform overloads of various standard functions (ops) for the hip half2 type. The half2 type is two 16-bit floats packed into a 32-bit number and therefore the overloads act on vectors of sizes that are multiples of 2. They are invoked in runtime compilation any time one of the ops is called on a tensor declared with the data type shape::half_type. Defined new template, made instances of the template for those math operations that the hip library contains, added verify tests for the sqrt operator for three cases: tensor size not divisible by 2 tensor size divisible by 2 but not by 4 tensor size divisible by 4
-
- 13 Apr, 2022 1 commit
-
-
Paul Fultz II authored
also added the PYTHON_DISABLE_VERSIONS cmake variable to disable python versions.
-
- 12 Apr, 2022 2 commits
-
-
Paul Fultz II authored
out-of-bounds access when generate uses nonpacked tensors and add some additional asserts for gpu memory.
-
Shucai Xiao authored
ref implementation of the gemm op is sequential, this PR is to parallelize the gemm computation in the ref implementation.
-
- 11 Apr, 2022 2 commits
-
-
bpickrel authored
Change the "scatter" struct and op to a base/child set of three: scatter_none, scatter_add, scatter_mul to mirror Onnx' ScatterElements op. and its three reduction options. (Onnx Scatter op is deprecated and is equivalent to scatter_none.) Provides both a reference op. and update to Onnx parsing. Tests updated and new test case added.
-
Shucai Xiao authored
When create a tensor_view with vector date type, the last dimension of the shape should be divided by the vec_size.
-
- 08 Apr, 2022 1 commit
-
-
Paul Fultz II authored
* Fix comparisons in migraphx::value class
-
- 06 Apr, 2022 1 commit
-
-
Umang Yadav authored
Adds following API binding and tests to python : add_return add_instruction add_parameter create_module.
-
- 31 Mar, 2022 1 commit
-
-
Umang Yadav authored
Documentation update for valid targets
-
- 29 Mar, 2022 1 commit
-
-
Paul Fultz II authored
This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape. This also makes it easier to add new runtime compiled kernels in the future.
-
- 28 Mar, 2022 2 commits
-
-
Paul Fultz II authored
Use ifdef instead of comment for the auto-generated method declarations for type erased classes (#1138) It seems the formatting of comments are unreadable for larger methods, so instead just generate a struct with the methods in the interface and add a comment if its optional. It wraps this in #ifdef TYPE_ERASED_DECLARATION(assuming this would never be defined) instead of #if 0, so most editors can still provide syntax highlighting(although I think vscode with clangd will still gray it out unfortunately).
-
Paul Fultz II authored
* Use ccache for runtime compilation
-
- 25 Mar, 2022 1 commit
-
-
Paul Fultz II authored
* Handle string literal in construction * Improve get_default with vector
-
- 24 Mar, 2022 1 commit
-
-
Paul Fultz II authored
This creates a custom op which has name() and compute_shape() methods.
-
- 22 Mar, 2022 1 commit
-
-
Paul Fultz II authored
Operators using arg.reshape() method the lifetime will be extended.
-
- 21 Mar, 2022 1 commit
-
-
Charlie Lin authored
* LpNormalization ONNX parser
-
- 18 Mar, 2022 2 commits
-
-
turneram authored
Add exclusive and reverse modes to gpu implementation of prefix_scan_sum, which completes support for ONNX op CumSum
-
Paul Fultz II authored
The get_context may change in the future(when we support multi-targets) so make this experimental for now.
-
- 15 Mar, 2022 2 commits
-
-
Umang Yadav authored
API includes following create_module, get_main_module add_instruction without module args add_instruction with module args add_parameter add_return
-
Paul Fultz II authored
This adds iterators to tensor_view, which can allow kernels to work with non-standard shapes like for roialign. To improve the performance of indexing when using the iterators, the shape class was updated to use integral_constants since the compiler doesn't always fold the const values. An integral_constant will at least enforce that in the AST. Finally, since index calculations with single integers are improved, I also updated pointwise to use single index rather than multi index. There is about 4% improvement in some cases.
-
- 14 Mar, 2022 2 commits
-
-
Shucai Xiao authored
change max number of groups in a kernel to 1B for greater performance
-
Paul Fultz II authored
* Show the operator fields in the driver
-
- 11 Mar, 2022 1 commit
-
-
Shucai Xiao authored
The module::debug_print(ins) is very slow, which makes the trave_eval==1/2 very slow. The reason is printing an ins involves search the whole module to get the instruction, the print it. This change is to fix that by calling module::print() to get names of all instructions of a program, then print the instruction by getting its name from a hash map.
-
- 09 Mar, 2022 3 commits
-
-
Charlie Lin authored
Add Celu ONNX operator
-
Paul Fultz II authored
Add python API to construct shape class
-
kahmed10 authored
Add a callable C++ API to migraphx
-
- 08 Mar, 2022 1 commit
-
-
Charlie Lin authored
* Implement size ONNX operator and tests
-
- 07 Mar, 2022 1 commit
-
-
Umang Yadav authored
add_common_op for parse_clip Should fix #1119
-
- 04 Mar, 2022 2 commits
-
-
Charlie Lin authored
Adds EyeLike ONNX parser and unit tests.
-
bpickrel authored
Changed the pooling values for two structures from strings to specialized enum classes. Many test and operator parsing changes to support this. Introduces one new source file, op_enums.cpp.
-
- 03 Mar, 2022 3 commits
-
-
Paul Fultz II authored
Boost the max number of workgroups for pointwise ops by matching what we are doing in launch.hpp
-
kahmed10 authored
better performance doing it this way
-
turneram authored
Add onnx parser and ref and gpu implementations of ONNX op ScatterND
-