Commits · 84add0fcb1652713d3818a2533eb64053bb7e050 · gaoqiong / MIGraphX

06 Oct, 2022 2 commits
- Merge branch 'develop' of github.com:ROCmSoftwarePlatform/AMDMIGraphX into refactor_auto_pad_conv · 84add0fc
  charlie authored Oct 06, 2022
  
  84add0fc
- Add pad_calc assert · c1caf40a
  charlie authored Oct 06, 2022
  
  c1caf40a
04 Oct, 2022 2 commits
- Stream sync Changset (#1358) · f7d987ba
  Ted Themistokleous authored Oct 04, 2022
```
Stream sync changes and associated API level changes
```
  f7d987ba
- Fast softmax (#1290) · a9a47402
  Paul Fultz II authored Oct 04, 2022
```
optimize the softmax operator
```
  a9a47402
03 Oct, 2022 2 commits

Add output_alias and runs_on_offload_target flags for the custom ops (#1309) · c9ffb38d

Umang Yadav authored Oct 03, 2022

Adds two methods for the custom_ops virtual class.

bool runs_on_offload_target(), if the custom op runs directly on the gpu then it should be set to true. in this case, custom op expects its parameters to reside in GPU memory and writes output to the GPU memory. If it is set to false then, custom op expects it's parameter to reside on the host and puts back the result into the host memory.

output_alias, if output of the custom op is aliasing the input buffer. i.e. interpreting the same input buffer with differnet shape and strides.

Update as_vector() in C++ API to handle non-standard shapes. It required exposing element_index to space_index conversion method for the shape class.

c9ffb38d

Update comments · ed2acdc4
charlie authored Oct 03, 2022

ed2acdc4

29 Sep, 2022 3 commits
- Merge branch 'develop' into refactor_auto_pad_conv · 3685906c
  Charlie Lin authored Sep 29, 2022
  
  3685906c
- Use find_2.0 API for the convolution (#1346) · e19f78ae
  Umang Yadav authored Sep 29, 2022
```
Improvements/Additions to be made:

changes for the quant_convolution,
changes for the deconvolution,
Macros for MIOpen status checks
```
  e19f78ae
- Fix invalid program in debug mode from find_splits (#1390) · c2842c1e
  Paul Fultz II authored Sep 28, 2022
```
* Fix invalid program from find_splits
```
  c2842c1e
28 Sep, 2022 1 commit

Add compute_fp32 flag for quant_gemm tests (#1360) · 70e63960

Umang Yadav authored Sep 28, 2022

test_gpu_pack_int8_args fails on gfx908 machine, because it doesn't set compute_fp32 flag correctly. This PR fixes the test such that it checks for the device-name, and rocblas-versions and sets this flag accordingly.

70e63960

27 Sep, 2022 1 commit
- Add onnx mod operator gpu cpu (#1306) · 40118191
  Ted Themistokleous authored Sep 26, 2022
```
Implement operator for CPU and GPU implementations
```
  40118191
26 Sep, 2022 3 commits

Rewrite ONNX parse batch norm (#1362) · c00f8202

Charlie Lin authored Sep 26, 2022

Rewrites the BatchNormalization ONNX operator into other MIGX operators
- Added handling of 1D input tensor case (edge case in ONNX spec)
Removes the spatial and per_activation functionality (not in the ONNX spec)
- Did not remove the batch_norm_inference related code as the TensorFlow parser still uses it
- Can remove that code when the TF version is updated

c00f8202

Use larger vector size instead of preloading for broadcasted inputs (#1389) · 492c4a6c
Paul Fultz II authored Sep 26, 2022

492c4a6c
Upgrade cppcheck to 2.9 (#1400) · 66bbff1e
Paul Fultz II authored Sep 26, 2022
```
Upgrade cppcheck to 2.9 
```
66bbff1e

24 Sep, 2022 2 commits

check concurrency on PR level with one running and one pending performance tests (#1401) · 94bc41dc

Chris Austen authored Sep 24, 2022

Workflow has concurrency reintroduced with different set of rules. New expected behavior is to check concurrency on PR level with one running and one pending performance tests. In case of multiple commits in same PR, always the latest commit is queued after initiated performance test execution is completed. Any other PRs/commits are in pending/queued state

94bc41dc

update codecov version (#1402) · 1b575b5c
Chris Austen authored Sep 24, 2022
```
Codecov announced deprecating the bash uploader. Using updated uploader
```
1b575b5c

23 Sep, 2022 1 commit
- Remove unused device functions (#1394) · 8ea8473d
  Paul Fultz II authored Sep 23, 2022
```
* Remove device functions
* Update tests
```
  8ea8473d
21 Sep, 2022 2 commits

Parameterize epsilon for layernorm kernel (#1367) · d9578ba6

kahmed10 authored Sep 21, 2022

This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.

d9578ba6

Multibroadcast find_mul_conv (#1384) · 9a70050b

Charlie Lin authored Sep 21, 2022

Change find_mul_conv to work with multibroadcast also. Checks the strides instead of the broadcast axis.

9a70050b

19 Sep, 2022 4 commits

Merge branch 'develop' into refactor_auto_pad_conv · 2b936b13
Charlie Lin authored Sep 19, 2022

2b936b13

Improve layernorm and reductions performance (#1348) · 97a1ed2d

Paul Fultz II authored Sep 19, 2022

Compute mean and variance in same reduction
Set block size to numbers divisible by 32 instead powers of 2
Global is also set exactly instead of being divisible by block size
More exact matching of global/local can help get rid of branching/loops
Reduce vectors first before doing dpp_reduce
Explicitly vectorize array operators since the compiler doesnt always vectorize them
Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported

97a1ed2d

Fix MLIR test · ca360585
charlie authored Sep 19, 2022

ca360585
Disabled concurrency, queue added to perf-test.yml (#1386) · 34c08db7
Chris Austen authored Sep 19, 2022

34c08db7

16 Sep, 2022 7 commits
- Fix typo for add_sigmoid (#1385) · 10f37f49
  Umang Yadav authored Sep 16, 2022
```
* fix typo for add_sigmoid
```
  10f37f49
- Fix comments · b3c6b7eb
  charlie authored Sep 16, 2022
  
  b3c6b7eb
- Merge branch 'develop' of github.com:ROCmSoftwarePlatform/AMDMIGraphX into refactor_auto_pad_conv · 8378a397
  charlie authored Sep 16, 2022
  
  8378a397
- Naming fix · f60297db
  charlie authored Sep 16, 2022
  
  f60297db
- Fix normalize attribute, fix same_lower bug · ef738568
  charlie authored Sep 16, 2022
  
  ef738568
- Progress on changing padding_mode · 0afab294
  charlie authored Sep 16, 2022
```
Weird bug with ref padding shape
still need to change parse_convolution
```
  0afab294
- Update deprecated Pybind constructor (#1382) · 255fb11a
  Umang Yadav authored Sep 16, 2022
```
* remove deprecated constructor
```
  255fb11a
15 Sep, 2022 2 commits

[mlir] Replaced `find_library` with `find_package` to locate MLIR static library (#1373) · e1e36cdc

Lixun Zhang authored Sep 15, 2022

* Replaced `find_library` with `find_package` to locate MLIR static library
* Unified the include dir for headers and remove backward compatibility
* Embedded the external/include dir into the exported library

e1e36cdc

Initial · 376b18af
charlie authored Sep 15, 2022

376b18af

14 Sep, 2022 4 commits
- Reduce problem size of unbatched_gemm tests (#1383) · 333860ce
  turneram authored Sep 14, 2022
```
The verify tests from pr #1354 were still causing some codecov timeouts after merge. This PR further reduces the problem sizes to avoid these failures.
```
  333860ce
- Fix split_reshape for slice len of 1 (#1379) · 4b76dd0d
  Umang Yadav authored Sep 14, 2022
```
* fix slice_dim1 for case
```
  4b76dd0d
- Implement concat using jit compilation (#1356) · 7662d9c0
  Paul Fultz II authored Sep 14, 2022
```
* Implement concat using jit compilation
```
  7662d9c0
- expose underlying migraphx::argument data pointer in pybind (#1376) · 827baeec
  shivadbhavsar authored Sep 13, 2022
```
expose underlying p data inter for migraphx.argument
Update python api documentation
```
  827baeec
13 Sep, 2022 1 commit

Use rocblas_gemm_ex for batched gemms with broadcasted B (#1354) · a10a8ef1

turneram authored Sep 13, 2022

Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension.
The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes.

Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.

a10a8ef1

09 Sep, 2022 1 commit
- Bump version to 2.4 (#1375) · d78bcdfb
  Chris Austen authored Sep 09, 2022
```
migraphx version is now 2.4
```
  d78bcdfb
08 Sep, 2022 2 commits
- Remove unused headers (#1363) · ed2c73ac
  Paul Fultz II authored Sep 08, 2022
```
* Remove unused headers
```
  ed2c73ac
- Fix TF literal parsing for relu6 (#1370) · f2667056
  Charlie Lin authored Sep 08, 2022
```
Fixes TF literal parsing for relu6.  previously always made a float type literal, breaks for float16 as an example
```
  f2667056