Commits · 6568a69df31fe21a97ff73c58b0f2c0c8ae77d27 · gaoqiong / MIGraphX

10 Oct, 2022 2 commits
- Seperate host/device time · 6568a69d
  Paul authored Oct 10, 2022
  
  6568a69d
- Skip some lines · 6fa85d83
  Paul authored Oct 09, 2022
  
  6fa85d83
09 Oct, 2022 13 commits
- Fix global size · 9db9f4e0
  Paul authored Oct 09, 2022
  
  9db9f4e0
- Add some warning about missing configs · 398d45a4
  Paul authored Oct 09, 2022
  
  398d45a4
- Merge branch 'ck-gemm' of github.com:ROCmSoftwarePlatform/AMDMIGraphX into ck-gemm · adb2feee
  Paul authored Oct 09, 2022
  
  adb2feee
- Format · 3087a7a6
  Paul authored Oct 09, 2022
  
  3087a7a6
- Use min · 5c17b757
  Paul authored Oct 09, 2022
  
  5c17b757
- Format · 694cb20a
  Paul authored Oct 09, 2022
  
  694cb20a
- Format · ff878ce6
  Paul authored Oct 09, 2022
  
  ff878ce6
- Load a tuning json file · 93a5de9f
  Paul authored Oct 09, 2022
  
  93a5de9f
- Fixes · 0f7d5123
  Paul authored Oct 09, 2022
  
  0f7d5123
- Format · 64285523
  Paul authored Oct 09, 2022
  
  64285523
- Tuning script · ad19d4dd
  Paul authored Oct 09, 2022
  
  ad19d4dd
- Format · 7bbb1cde
  Paul authored Oct 08, 2022
  
  7bbb1cde
- Move to cpp file · b57f58e1
  Paul authored Oct 08, 2022
  
  b57f58e1
08 Oct, 2022 2 commits
- Format · 873f6c0c
  Paul authored Oct 08, 2022
  
  873f6c0c
- Handle transposes and data types · 6ad2af4e
  Paul authored Oct 08, 2022
  
  6ad2af4e
07 Oct, 2022 4 commits
- Use get · c343c534
  Paul authored Oct 07, 2022
  
  c343c534
- Disable format · 19a570a0
  Paul authored Oct 07, 2022
  
  19a570a0
- Format · 969be85c
  Paul authored Oct 07, 2022
  
  969be85c
- Add ck_gemm · bfad5a5b
  Paul authored Oct 07, 2022
  
  bfad5a5b
04 Oct, 2022 2 commits
- Stream sync Changset (#1358) · f7d987ba
  Ted Themistokleous authored Oct 04, 2022
```
Stream sync changes and associated API level changes
```
  f7d987ba
- Fast softmax (#1290) · a9a47402
  Paul Fultz II authored Oct 04, 2022
```
optimize the softmax operator
```
  a9a47402
03 Oct, 2022 1 commit

Add output_alias and runs_on_offload_target flags for the custom ops (#1309) · c9ffb38d

Umang Yadav authored Oct 03, 2022

Adds two methods for the custom_ops virtual class.

bool runs_on_offload_target(), if the custom op runs directly on the gpu then it should be set to true. in this case, custom op expects its parameters to reside in GPU memory and writes output to the GPU memory. If it is set to false then, custom op expects it's parameter to reside on the host and puts back the result into the host memory.

output_alias, if output of the custom op is aliasing the input buffer. i.e. interpreting the same input buffer with differnet shape and strides.

Update as_vector() in C++ API to handle non-standard shapes. It required exposing element_index to space_index conversion method for the shape class.

c9ffb38d

29 Sep, 2022 2 commits
- Use find_2.0 API for the convolution (#1346) · e19f78ae
  Umang Yadav authored Sep 29, 2022
```
Improvements/Additions to be made:

changes for the quant_convolution,
changes for the deconvolution,
Macros for MIOpen status checks
```
  e19f78ae
- Fix invalid program in debug mode from find_splits (#1390) · c2842c1e
  Paul Fultz II authored Sep 28, 2022
```
* Fix invalid program from find_splits
```
  c2842c1e
28 Sep, 2022 1 commit

Add compute_fp32 flag for quant_gemm tests (#1360) · 70e63960

Umang Yadav authored Sep 28, 2022

test_gpu_pack_int8_args fails on gfx908 machine, because it doesn't set compute_fp32 flag correctly. This PR fixes the test such that it checks for the device-name, and rocblas-versions and sets this flag accordingly.

70e63960

27 Sep, 2022 1 commit
- Add onnx mod operator gpu cpu (#1306) · 40118191
  Ted Themistokleous authored Sep 26, 2022
```
Implement operator for CPU and GPU implementations
```
  40118191
26 Sep, 2022 3 commits

Rewrite ONNX parse batch norm (#1362) · c00f8202

Charlie Lin authored Sep 26, 2022

Rewrites the BatchNormalization ONNX operator into other MIGX operators
- Added handling of 1D input tensor case (edge case in ONNX spec)
Removes the spatial and per_activation functionality (not in the ONNX spec)
- Did not remove the batch_norm_inference related code as the TensorFlow parser still uses it
- Can remove that code when the TF version is updated

c00f8202

Use larger vector size instead of preloading for broadcasted inputs (#1389) · 492c4a6c
Paul Fultz II authored Sep 26, 2022

492c4a6c
Upgrade cppcheck to 2.9 (#1400) · 66bbff1e
Paul Fultz II authored Sep 26, 2022
```
Upgrade cppcheck to 2.9 
```
66bbff1e

24 Sep, 2022 2 commits

check concurrency on PR level with one running and one pending performance tests (#1401) · 94bc41dc

Chris Austen authored Sep 24, 2022

Workflow has concurrency reintroduced with different set of rules. New expected behavior is to check concurrency on PR level with one running and one pending performance tests. In case of multiple commits in same PR, always the latest commit is queued after initiated performance test execution is completed. Any other PRs/commits are in pending/queued state

94bc41dc

update codecov version (#1402) · 1b575b5c
Chris Austen authored Sep 24, 2022
```
Codecov announced deprecating the bash uploader. Using updated uploader
```
1b575b5c

23 Sep, 2022 1 commit
- Remove unused device functions (#1394) · 8ea8473d
  Paul Fultz II authored Sep 23, 2022
```
* Remove device functions
* Update tests
```
  8ea8473d
21 Sep, 2022 2 commits

Parameterize epsilon for layernorm kernel (#1367) · d9578ba6

kahmed10 authored Sep 21, 2022

This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.

d9578ba6

Multibroadcast find_mul_conv (#1384) · 9a70050b

Charlie Lin authored Sep 21, 2022

Change find_mul_conv to work with multibroadcast also. Checks the strides instead of the broadcast axis.

9a70050b

19 Sep, 2022 2 commits

Improve layernorm and reductions performance (#1348) · 97a1ed2d

Paul Fultz II authored Sep 19, 2022

Compute mean and variance in same reduction
Set block size to numbers divisible by 32 instead powers of 2
Global is also set exactly instead of being divisible by block size
More exact matching of global/local can help get rid of branching/loops
Reduce vectors first before doing dpp_reduce
Explicitly vectorize array operators since the compiler doesnt always vectorize them
Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported

97a1ed2d

Disabled concurrency, queue added to perf-test.yml (#1386) · 34c08db7
Chris Austen authored Sep 19, 2022

34c08db7

16 Sep, 2022 2 commits
- Fix typo for add_sigmoid (#1385) · 10f37f49
  Umang Yadav authored Sep 16, 2022
```
* fix typo for add_sigmoid
```
  10f37f49
- Update deprecated Pybind constructor (#1382) · 255fb11a
  Umang Yadav authored Sep 16, 2022
```
* remove deprecated constructor
```
  255fb11a