Commits · 8b25fd3e3de3cd9ef8ba4623e5cfdbd488227833 · gaoqiong / MIGraphX

11 Oct, 2022 2 commits

Fix things · a88810da
charlie authored Oct 11, 2022
```
convolution revert
```
a88810da

Redo design · d9d2215a

charlie authored Oct 11, 2022

* doesn't make much sense to make broadcast use two inputs or handle
dynamic shapes
* compute the common shape for dynamic multibroadcast in the
multibroadcast op
* multibroadcast all combinations of the dynamic inputs

d9d2215a

10 Oct, 2022 1 commit

Add dynamic shape constructor · 4d913223

charlie authored Oct 10, 2022

Constructs a dynamic shape from three vectors of lengths,
the minimums, maximums, and optimals

4d913223

07 Oct, 2022 1 commit

Simplify unit algebraic ops (#1281) · 4f3cc417

Ted Themistokleous authored Oct 07, 2022

Simplified algebraic operations (x*1), x*(-1), x/1, 0+x & x+0,  x-0, 0-x, 0*x, x*0, and 0/x operations

4f3cc417

04 Oct, 2022 2 commits
- Stream sync Changset (#1358) · f7d987ba
  Ted Themistokleous authored Oct 04, 2022
```
Stream sync changes and associated API level changes
```
  f7d987ba
- Fast softmax (#1290) · a9a47402
  Paul Fultz II authored Oct 04, 2022
```
optimize the softmax operator
```
  a9a47402
03 Oct, 2022 2 commits

More progress · b162c4ec
charlie authored Oct 03, 2022

b162c4ec

Add output_alias and runs_on_offload_target flags for the custom ops (#1309) · c9ffb38d

Umang Yadav authored Oct 03, 2022

Adds two methods for the custom_ops virtual class.

bool runs_on_offload_target(), if the custom op runs directly on the gpu then it should be set to true. in this case, custom op expects its parameters to reside in GPU memory and writes output to the GPU memory. If it is set to false then, custom op expects it's parameter to reside on the host and puts back the result into the host memory.

output_alias, if output of the custom op is aliasing the input buffer. i.e. interpreting the same input buffer with differnet shape and strides.

Update as_vector() in C++ API to handle non-standard shapes. It required exposing element_index to space_index conversion method for the shape class.

c9ffb38d

02 Oct, 2022 1 commit
- Progress on changing ops · 412c298e
  charlie authored Oct 02, 2022
```
multibroadcast and broadcast take two inputs
```
  412c298e
30 Sep, 2022 3 commits
- Initial · 78c799c5
  charlie authored Sep 30, 2022
  
  78c799c5
- Change parse_batchnorm to use max_lens() · b4bbdde5
  charlie authored Sep 30, 2022
```
For dynamic shape handling
```
  b4bbdde5
- Revert ref_conv changes · a9c0252a
  charlie authored Sep 30, 2022
```
Not needed, special case with dynamic padding
```
  a9c0252a
29 Sep, 2022 6 commits
- Remove shape.empty() function, wasn't used · 91f89fcc
  charlie authored Sep 29, 2022
  
  91f89fcc
- Fix elu and leaky_relu pointwise JIT · 48c7c810
  charlie authored Sep 29, 2022
  
  48c7c810
- Remove unneeded headers · 5ba8cdf6
  charlie authored Sep 29, 2022
  
  5ba8cdf6
- Use find_2.0 API for the convolution (#1346) · e19f78ae
  Umang Yadav authored Sep 29, 2022
```
Improvements/Additions to be made:

changes for the quant_convolution,
changes for the deconvolution,
Macros for MIOpen status checks
```
  e19f78ae
- Fix context-free output_shape · 0fb17f71
  charlie authored Sep 29, 2022
  
  0fb17f71
- Fix invalid program in debug mode from find_splits (#1390) · c2842c1e
  Paul Fultz II authored Sep 28, 2022
```
* Fix invalid program from find_splits
```
  c2842c1e
28 Sep, 2022 3 commits
- Add convert handle dynamic shape check_shapes · ff195f97
  charlie authored Sep 28, 2022
  
  ff195f97
- Unary ops changes and tests · 65e14286
  charlie authored Sep 28, 2022
  
  65e14286
- Add compute_fp32 flag for quant_gemm tests (#1360) · 70e63960
  Umang Yadav authored Sep 28, 2022
```
test_gpu_pack_int8_args fails on gfx908 machine, because it doesn't set compute_fp32 flag correctly. This PR fixes the test such that it checks for the device-name, and rocblas-versions and sets this flag accordingly.
```
  70e63960
27 Sep, 2022 4 commits
- Dynamic unary function · a56e1601
  charlie authored Sep 27, 2022
  
  a56e1601
- Check dynamic() on shape when using dyn_output · 07c05efb
  charlie authored Sep 27, 2022
  
  07c05efb
- Remove stuff commented out · da5b6fef
  charlie authored Sep 27, 2022
  
  da5b6fef
- Add onnx mod operator gpu cpu (#1306) · 40118191
  Ted Themistokleous authored Sep 26, 2022
```
Implement operator for CPU and GPU implementations
```
  40118191
26 Sep, 2022 4 commits

Rewrite ONNX parse batch norm (#1362) · c00f8202

Charlie Lin authored Sep 26, 2022

Rewrites the BatchNormalization ONNX operator into other MIGX operators
- Added handling of 1D input tensor case (edge case in ONNX spec)
Removes the spatial and per_activation functionality (not in the ONNX spec)
- Did not remove the batch_norm_inference related code as the TensorFlow parser still uses it
- Can remove that code when the TF version is updated

c00f8202

Fixed using pack() correctly · f1c18355
charlie authored Sep 26, 2022

f1c18355
Use larger vector size instead of preloading for broadcasted inputs (#1389) · 492c4a6c
Paul Fultz II authored Sep 26, 2022

492c4a6c
Upgrade cppcheck to 2.9 (#1400) · 66bbff1e
Paul Fultz II authored Sep 26, 2022
```
Upgrade cppcheck to 2.9 
```
66bbff1e

23 Sep, 2022 2 commits
- Still broken, figuring things out · 68c17b1b
  charlie authored Sep 23, 2022
  
  68c17b1b
- Remove unused device functions (#1394) · 8ea8473d
  Paul Fultz II authored Sep 23, 2022
```
* Remove device functions
* Update tests
```
  8ea8473d
22 Sep, 2022 2 commits
- Initial · f02f5d98
  charlie authored Sep 22, 2022
  
  f02f5d98
- expose to_shapes(const std::vector<argument>& args) · e0cb7b9a
  charlie authored Sep 22, 2022
  
  e0cb7b9a
21 Sep, 2022 2 commits

Parameterize epsilon for layernorm kernel (#1367) · d9578ba6

kahmed10 authored Sep 21, 2022

This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.

d9578ba6

Multibroadcast find_mul_conv (#1384) · 9a70050b

Charlie Lin authored Sep 21, 2022

Change find_mul_conv to work with multibroadcast also. Checks the strides instead of the broadcast axis.

9a70050b

19 Sep, 2022 1 commit

Improve layernorm and reductions performance (#1348) · 97a1ed2d

Paul Fultz II authored Sep 19, 2022

Compute mean and variance in same reduction
Set block size to numbers divisible by 32 instead powers of 2
Global is also set exactly instead of being divisible by block size
More exact matching of global/local can help get rid of branching/loops
Reduce vectors first before doing dpp_reduce
Explicitly vectorize array operators since the compiler doesnt always vectorize them
Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported

97a1ed2d

16 Sep, 2022 4 commits
- Fix typo for add_sigmoid (#1385) · 10f37f49
  Umang Yadav authored Sep 16, 2022
```
* fix typo for add_sigmoid
```
  10f37f49
- Fix comments · b3c6b7eb
  charlie authored Sep 16, 2022
  
  b3c6b7eb
- Naming fix · f60297db
  charlie authored Sep 16, 2022
  
  f60297db
- Fix normalize attribute, fix same_lower bug · ef738568
  charlie authored Sep 16, 2022
  
  ef738568