Commits · e44cecbc67d53dd62ef575eebd81d61dc866b8b3 · gaoqiong / MIGraphX

"docs/en_US/Tutorial/InstallCustomizedAlgos.rst" did not exist on "d165905d0ba24cfba414b8e0c20fa8d7c8ab6a6e"

22 Jun, 2022 1 commit
- Update license files (#1248) · e44cecbc
  Ted Themistokleous authored Jun 22, 2022
```
Updated each source file in the repo with the existing license.
```
  e44cecbc
17 Jun, 2022 1 commit

Create allocate op and replace_allocate pass (#1183) · add6fb3b

kahmed10 authored Jun 17, 2022



* add allocate op header

* formatting

* add replace_allocate pass

* formatting

* move output param to remove_allocate pass

* formatting

* fix bugs in replace_allocate pass

* formatting

* fix verify if tests

* formatting

* move if op logic

* formatting

* cleanup lowering

* cleanup lowering

* formatting

* fix tidy

* formatting

* fix tidy

* add cpu allocate check

* formatting

* change cpu allocate in pass

* formatting

* add some tests for replace_allocate pass

* formatting

* pass by ref

* fix run_pass

* formatting

* update variable name for module

* update dce to use contains() and fix tidy

* formatting

* update cppcheck

* add if test

* formatting

* add if test

* rename var to mod_output_names

* formatting

* remove conditional

* update allocate op and tests

* formatting

* update replace_allocate tests

* update create_output_names() and conditional in replace_allocate

* formatting

* remove extra variable in replace_allocate

* update tools script for allocation_model
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

add6fb3b

26 May, 2022 1 commit
- Upgrade to cppcheck 2.8 and fix new issues found (#1225) · a401e72a
  Paul Fultz II authored May 26, 2022
```
* Upgrade to cppcheck 2.8
```
  a401e72a
20 May, 2022 1 commit
- Improve matching with has_value when there are convert operators (#1212) · 27af0170
  Paul Fultz II authored May 19, 2022
  
  27af0170
17 May, 2022 1 commit
- renamed variables for module from p to m (#1204) · a27dd28c
  shivadbhavsar authored May 17, 2022
```
Updated variable names according to #1193
```
  a27dd28c
11 May, 2022 1 commit

Prefuse layernorm for gpu (#1190) · 671f24be

Paul Fultz II authored May 11, 2022

Fuse layernorm and added triadd_layernorm fusion.  This is a prep performance booster

671f24be

05 May, 2022 1 commit

Cppcheck fixes (#1195) · d582425b

Paul Fultz II authored May 05, 2022

Fixes the #error when using cppcheck. This no longer suppresses cppcheck errors when including those errors. This fixes the cppcheck errors that was there already.

d582425b

29 Apr, 2022 1 commit
- Add GatherND operator (#1089) · 4ec35e5f
  turneram authored Apr 28, 2022
```
Add ref and gpu implementations for ONNX op GatherND

Resolves #1032
```
  4ec35e5f
19 Apr, 2022 1 commit

Refactor Pooling and implement ONNX LpPool and GlobalLpPool (#1152) · 764273e4

Charlie Lin authored Apr 18, 2022

Refactored the reference implementation of pooling to something like what was done for roialign. Moved the reference implementation of pooling from targets/ref/lowering.cpp to pooling.hpp.
Removed cpu_pooling, instead using reference pooling in pooling.hpp
Added reference implementation of Lp Norm pooling and the global version
Added tests for the Lp Norm Pooling

764273e4

17 Apr, 2022 1 commit

Reduce with runtime compilation (#1150) · f9a5b81e

Paul Fultz II authored Apr 17, 2022

There is significant improvement on larger tensors with half almost 50% faster:

lens: [1024, 384, 768]
gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms
gpu::reduce_sum[axes={2}]: 1.73126ms
Also for non-trivial layouts this can sometimes be over 2x faster:

lens: [64, 1024, 768, 4]
gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms
gpu::reduce_sum[axes={1}]: 2.63375ms
Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR.

Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.

f9a5b81e

12 Apr, 2022 2 commits
- Fix out-of-bounds access when generate uses nonpacked tensors (#1160) · 262ba721
  Paul Fultz II authored Apr 12, 2022
```
out-of-bounds access when generate uses nonpacked tensors and add some additional asserts for gpu memory.
```
  262ba721
- parallelize the ref implementation of the gemm operator (#1142) · 88b3dd34
  Shucai Xiao authored Apr 12, 2022
```
ref implementation of the gemm op is sequential, this PR is to parallelize the gemm computation in the ref implementation.
```
  88b3dd34
11 Apr, 2022 1 commit

scatter operator refactoring to include reduction (#1124) · 701c2014

bpickrel authored Apr 11, 2022

Change the "scatter" struct and op to a base/child set of three: scatter_none, scatter_add, scatter_mul to mirror Onnx' ScatterElements op. and its three reduction options. (Onnx Scatter op is deprecated and is equivalent to scatter_none.)

Provides both a reference op. and update to Onnx parsing. Tests updated and new test case added.

701c2014

08 Apr, 2022 1 commit
- Fix comparisons in migraphx::value class (#1146) · 1e0bbd78
  Paul Fultz II authored Apr 08, 2022
```
* Fix comparisons in migraphx::value class
```
  1e0bbd78
29 Mar, 2022 1 commit

Refactor runtime compiled kernels to use the same compile_ops pipeline (#1125) · 661046c6

Paul Fultz II authored Mar 29, 2022

This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape.

This also makes it easier to add new runtime compiled kernels in the future.

661046c6

28 Mar, 2022 2 commits

Use ifdef instead of comment for the auto-generated method declarations for... · 8e4d622f

Paul Fultz II authored Mar 28, 2022

Use ifdef instead of comment for the auto-generated method declarations for type erased classes (#1138)

It seems the formatting of comments are unreadable for larger methods, so instead just generate a struct with the methods in the interface and add a comment if its optional. It wraps this in #ifdef TYPE_ERASED_DECLARATION(assuming this would never be defined) instead of #if 0, so most editors can still provide syntax highlighting(although I think vscode with clangd will still gray it out unfortunately).

8e4d622f

Use ccache for runtime compilation (#1131) · ad056b1f
Paul Fultz II authored Mar 28, 2022
```
* Use ccache for runtime compilation
```
ad056b1f

25 Mar, 2022 1 commit
- Improve handling of string literals in value class (#1141) · c73c0dae
  Paul Fultz II authored Mar 25, 2022
```
* Handle string literal in construction
* Improve get_default with vector
```
  c73c0dae
22 Mar, 2022 1 commit
- Remove borrowed lifetime from operators that are no longer borrowing their lifetime (#1134) · cd165ebd
  Paul Fultz II authored Mar 22, 2022
```
Operators using arg.reshape() method the lifetime will be extended.
```
  cd165ebd
18 Mar, 2022 2 commits

Complete GPU implementation of CumSum op (#1094) · 548783c8

turneram authored Mar 18, 2022

Add exclusive and reverse modes to gpu implementation of prefix_scan_sum, which completes support for ONNX op CumSum

548783c8

Make get_context experimental (#1137) · e521fa3f

Paul Fultz II authored Mar 18, 2022

The get_context may change in the future(when we support multi-targets) so make this experimental for now.

e521fa3f

14 Mar, 2022 1 commit
- Show the operator fields in the driver (#1103) · 9077db18
  Paul Fultz II authored Mar 14, 2022
```
* Show the operator fields in the driver
```
  9077db18
11 Mar, 2022 1 commit

Improve print ins (#1096) · b3b44f5d

Shucai Xiao authored Mar 11, 2022

The module::debug_print(ins) is very slow, which makes the trave_eval==1/2 very slow. The reason is printing an ins involves search the whole module to get the instruction, the print it.  This change is to fix that by calling module::print() to get names of all instructions of a program, then print the instruction by getting its name from a hash map.

b3b44f5d

04 Mar, 2022 1 commit

Mode as enum for pooling and roi_align (#1091) · a2e90b5d

bpickrel authored Mar 04, 2022

Changed the pooling values for two structures from strings to specialized enum classes. Many test and operator parsing changes to support this. Introduces one new source file, op_enums.cpp.

a2e90b5d

03 Mar, 2022 1 commit
- Add ScatterND operator (#1074) · 832f28c6
  turneram authored Mar 02, 2022
```
Add onnx parser and ref and gpu implementations of ONNX op ScatterND
```
  832f28c6
02 Mar, 2022 2 commits
- isnan operator (#1100) · bfedcd45
  Charlie Lin authored Mar 02, 2022
```
Implements the IsNaN operator, ref, gpu, and onnx parser.
```
  bfedcd45
- Clang format ver10 (#1106) · 9852aaef
  bpickrel authored Mar 02, 2022
```
Update the base version of clang-format from 5.0 to 10.0
```
  9852aaef
25 Feb, 2022 2 commits
- Add with_type to shape class (#1102) · 85b0563c
  Paul Fultz II authored Feb 25, 2022
```
Add with_type to shape class
```
  85b0563c
- Add get_queue to context to get the current stream (#1097) · e5242676
  Paul Fultz II authored Feb 24, 2022
```
wrapped in a any_ptr class so the type can be checked at runtime for a mismatch.
```
  e5242676
23 Feb, 2022 1 commit

Keep std shape (#1059) · 98dfdf15

Shucai Xiao authored Feb 23, 2022

This PR is the resolve two problems in the issue#999, i.e., non_standard_shape input to reshape and reduce_mean.
Three fixes:

Any operator that has a standard shape requirement will add a contiguous input for its input.
Eliminate_contiguous, when computing whether a contiguous can be removed, we should use all the updated args, not just the one that is being checked.
In two optimization in the simplify_reshape, we remove the contiguous in the reshaper name list, since eliminate_contiguous will remove the contiguous if it can be removed.
the solution is add an attribute to the operator that requires standard input shape, then in the auto_contiguous pass, add a contiguous to every input of such operators.

98dfdf15

16 Feb, 2022 1 commit
- Support nonstandard shapes for the UnSqueeze Op (#1071) · 4480eb79
  Umang Yadav authored Feb 16, 2022
```
Support nonstandard shapes like slice, broadcast and transpose for the unsqueeze op
```
  4480eb79
09 Feb, 2022 1 commit
- Support nonstandard shapes for the Squeeze Op (#1068) · e64b773f
  Umang Yadav authored Feb 09, 2022
```
Support slice, broadcast and transpose shapes for the squeeze op.
```
  e64b773f
08 Feb, 2022 1 commit
- Enforce types to avoid compilation error in pointwise fusions (#1077) · 73b8a773
  Paul Fultz II authored Feb 08, 2022
```
Enforce types to avoid compilation error in pointwise fusions
This fixes compile failure: gpt-2, fp16 on Navi
```
  73b8a773
02 Feb, 2022 1 commit

Update trace_eval to preview the output buffers (#1073) · b20e3d4d

Paul Fultz II authored Feb 02, 2022

Currently, MIGRAPHX_TRACE_EVAL=2 prints out the entire output buffer, but this can produce a lot of output. To make it easier to inspect and debug, using MIGRAPHX_TRACE_EVAL=2 now only prints 10 elements from the buffer(the first 5 and last 5) and shows any fp classifications found in the buffer(ie nans, infinity, etc). The previous behavior can still be enabled with MIGRAPHX_TRACE_EVAL=3.

b20e3d4d

28 Jan, 2022 1 commit

Add auto-vectorization of pointwise operators (#1047) · 78a3c9b7

Paul Fultz II authored Jan 28, 2022

* Enable auto vectorization
* Handle vector types with convert function
* Dont vectorize when it will cause problems with preload

78a3c9b7

27 Jan, 2022 1 commit
- Remove Standard Shape requirement for ArgOps (#1042) · 332cb710
  Umang Yadav authored Jan 27, 2022
```
allow nonstd shape for the arg ops, non-standard shapes include broadcast, slice and transpose
```
  332cb710
17 Jan, 2022 1 commit
- Make clip a pointwise op (#1043) · b0ece214
  Paul Fultz II authored Jan 17, 2022
```
Make clip a pointwise op
```
  b0ece214
08 Dec, 2021 1 commit
- Fuse convert ops (#1020) · 00bfed4d
  Paul Fultz II authored Dec 08, 2021
  
  00bfed4d
25 Nov, 2021 1 commit

Non std shape auto contiguous (#1001) · 2d4dcc47

Shucai Xiao authored Nov 25, 2021

Resolves a problem in parsing the ssd-10 model.

The problem is, after inserting contiguous in the auto_contiguous pass, standard output shape of some operators becomes non-standard. Then, if the next operator requires standard input shape, an exception is throw.

For example, if we pass the following model:
Input (standard shape) -> transpose (transposed) -> softmax (transposed) -> transpose (standard) -> gather.
It works fine, and no contiguous is required.

In the auto_contiguous pass, a contiguous is inserted after the first transpose. Then we need to replace the first transpose with the contiguous and recompute all shapes. When it comes to the gather operator, its input is a transposed shape, and an exception is thrown.

The solution is in the recompute_shape() function. If it is called by the auto_contiguous pass and shape of an instruction is changed, and the shape is non_standard, we do not recompute shape of its output. The reason is: since its output shape is non_standard, a contiguous op will be added after the instruction, which will recompute shape for later operators.

2d4dcc47

22 Nov, 2021 1 commit

Add fp16 verify to driver (#988) · 3c1e91dc

kahmed10 authored Nov 22, 2021

Allows --fp16 to be used in the driver to compare the target fp16 result and the ref fp32 result.

3c1e91dc