Commits · 414ea2915b7e8a64c51bde7a799b7e3bb0e4c18d · gaoqiong / MIGraphX

24 Nov, 2021 4 commits
- clang format · 414ea291
  Shucai Xiao authored Nov 24, 2021
  
  414ea291
- fix review comments · 0b851159
  Shucai Xiao authored Nov 24, 2021
  
  0b851159
- Merge branch 'develop' of github.com:ROCmSoftwarePlatform/AMDMIGraphX into... · d3267bb3
  Shucai Xiao authored Nov 24, 2021
```
Merge branch 'develop' of github.com:ROCmSoftwarePlatform/AMDMIGraphX into test_runner_match_input_output
```
  d3267bb3
- Check jit kernels files with clang-tidy (#1012) · a33d6fa2
  Paul Fultz II authored Nov 24, 2021
```
* Check jit kernels files with clang-tidy
```
  a33d6fa2
22 Nov, 2021 3 commits

Helper script for rocTX run and parse (#985) · 4f9a0ce7

Cagri authored Nov 22, 2021

This provides a helper script to run rocTX markers with migraphx-driver and reduces the number of steps a user would go through running rocTX knob.
Run:
python roctx.py --run --onnx_file <ONNX_FILE> --migraphx_args "--onnx --gpu --fp16 --batch 16" --out outputfolder
Runs and parses the run output (JSON file). An example output is given below:

SUM MIN MAX
Marker start: gpu::convolution 5272 10 563
Marker start: gpu::add_relu 605 12 18
Marker start: gpu::gather 299 145 154
Marker start: gpu::mul_add 227 14 57
Marker start: gpu::sub 177 13 42
Marker start: gpu::concat 169 22 31
Marker start: gpu::triadd_relu 163 15 18
Marker start: load 141 0 3
Marker start: hip::hip_copy_literal 111 0 3
Marker start: gpu::add 58 13 17
Marker start: broadcast 52 0 3
Marker start: gpu::convert 31 15 16
Marker start: slice 11 0 1
Marker start: gpu::pooling 9 9 9
Marker start: step 2 2 2
Marker start: @param 2 0 1
Marker start: reshape 1 0 1
Marker start: hip::hip_allocate_memory 1 1 1
Marker start: check_context::migraphx::version_... 0 ERR ERR

TOTAL TIME: 7331 us

JSON FILE PATH: [...]/rpl_data_211019_195229_9369/input_results_211019_195229/trace.json
Parse:
python roctx.py --parse --json_path <JSON PATH FROM RUN>
Note: The parse knob is made available if the user wants to parse an already existing JSON output.

4f9a0ce7

Add fp16 verify to driver (#988) · 3c1e91dc

kahmed10 authored Nov 22, 2021

Allows --fp16 to be used in the driver to compare the target fp16 result and the ref fp32 result.

3c1e91dc

Fix target flag · 636bce89
Paul authored Nov 22, 2021

636bce89

18 Nov, 2021 1 commit
- Parallel compilation (#1007) · b0bc71cd
  Paul Fultz II authored Nov 18, 2021
```
Do compilation in parallel
```
  b0bc71cd
17 Nov, 2021 1 commit

Handle removing contiguous on operators that use modules (#1005) · 785307c3

Paul Fultz II authored Nov 17, 2021

Currently, eliminate_contiguous will never remove contiguous for operators that use module inputs due to the fact that it doesn't pass the module inputs to compute_shape.

- Update to pass the module inputs correctly to compute_shape
- Fix the overloads of compute_shape so that when passed an empty vector of module inputs it will call the overload without module inputs
- Add tests with contiguous and pointwise module function.
- Move add_pointwise function to a seperate header to reuse across different tests

785307c3

16 Nov, 2021 1 commit
- Merge branch 'develop' of github.com:ROCmSoftwarePlatform/AMDMIGraphX into... · 2d9e620b
  Shucai Xiao authored Nov 16, 2021
```
Merge branch 'develop' of github.com:ROCmSoftwarePlatform/AMDMIGraphX into test_runner_match_input_output
```
  2d9e620b
15 Nov, 2021 1 commit

Update driver's perf report to account for batch size (#1000) · 19f65e7e

kahmed10 authored Nov 15, 2021

Currently we have the option of passing in --batch to the driver to change the batch size when the model has a dynamic dim value. We can use this flag to adjust the perf report's rate.

19f65e7e

11 Nov, 2021 1 commit

Conditionally enable pointwise fusion (#992) · 157935ff

Paul Fultz II authored Nov 10, 2021

This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored.

This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.

157935ff

10 Nov, 2021 1 commit

Turn on gemm unit tests (#997) · 38287064

Shucai Xiao authored Nov 10, 2021

This PR is to turn on a few gemm unit test with int8 input datatype. Before rocm4.4, int8 input data type requires matrix size to be no less than 4 in rocblas implementation. Because of this limitation, we turned off a few gemm unit tests with int8 input data type.

This limitation is removed in rocm4.4, so after we upgrade to rocm4.5, we can turn on these unit tests. Also we change to unit test conv_bn_add to adding instructions to module instead of program.
Co-authored-by: kahmed10 <15948690+kahmed10@users.noreply.github.com>

38287064

09 Nov, 2021 1 commit
- Failing fusion plan workaround (#995) · fb39e5e4
  turneram authored Nov 09, 2021
```
* Add workaround for devices that do not support miopen conv fusions
```
  fb39e5e4
08 Nov, 2021 2 commits
- Merge branch 'develop' into test_runner_match_input_output · 2a73d9a9
  Shucai Xiao authored Nov 08, 2021
  
  2a73d9a9
- Install pcre from github since the ftp.pcre.org site is no longer available (#998) · db439b30
  Paul Fultz II authored Nov 08, 2021
```
* Install pcre from github since the ftp.pcre.org site is no longer available
```
  db439b30
05 Nov, 2021 7 commits
- Merge branch 'develop' into test_runner_match_input_output · 1d4a7c11
  Shucai Xiao authored Nov 05, 2021
  
  1d4a7c11
- Update Docker to ROCm 4.5 and support Navi on Jenkins (#994) · 04e17804
  kahmed10 authored Nov 05, 2021
```
Moving our Docker file from ROCm 4.3 to 4.5 
Add Navi base GPUs in to the CI infrastructure 
```
  04e17804
- additional refinement of print out info · df2da5de
  Shucai Xiao authored Nov 05, 2021
  
  df2da5de
- refine output log info · 6323fc2c
  Shucai Xiao authored Nov 05, 2021
  
  6323fc2c
- additional refinement for input and output name processing · c4d1c4f3
  Shucai Xiao authored Nov 04, 2021
  
  c4d1c4f3
- additional refinement of input and output names mapping · f5409f95
  Shucai Xiao authored Nov 04, 2021
  
  f5409f95
- refine the input and output data file processing · 34fcdc47
  Shucai Xiao authored Nov 04, 2021
  
  34fcdc47
03 Nov, 2021 3 commits

clang format · 690dd868
Shucai Xiao authored Nov 03, 2021

690dd868
refine test_runner to match inputs and outputs according to their names · 544811c3
Shucai Xiao authored Nov 03, 2021

544811c3

Add tests for the DepthToSpace+Binary pointwise operations fusion (#987) · eb6abd27

Umang Yadav authored Nov 03, 2021

In migraphx, DepthToSpace (d2s) is implemented as reshape --> transpose --> contiguous --> reshape.

If there is trailing binary pointwise operator after depthToSpace then, migraphx can move binary operator before contiguous and reshape of the depthtospce.

So, it becomes reshape-->transpose-->binary_op-->contiguous-->reshape.

Explicit contiguous wouldn't be required since binary_op outputs standard shape. So, it becomes reshape-->transpose-->binary-->reshape.

simplify_reshapes already has matcher that can do this transformation. This PR adds test for cases like depthtospace +binary op.

solves #905

eb6abd27

28 Oct, 2021 4 commits

NonMaxSuppression op ref implementation (#968) · c98b22d8

Shucai Xiao authored Oct 28, 2021

This PR is the ref implementation of the nonmaxsuppression operator. It always returns the max possible output shape, which is the problem tracked in issue #948.

c98b22d8

DepthToSpace and pointwise unary operations fusion (#986) · cf0b6d6d

Umang Yadav authored Oct 28, 2021

In migraphx, DepthToSpace (d2s) is implemented as reshape --> transpose --> contiguous --> reshape.

This PR adds matcher to find d2s + unary pointwise ops.

Application of the matcher moves the pointwise unary operation before the contiguous and reshape of the d2s.
So it becomes
reshape --> transpose --> unary --> contiguous --> reshape.

Motivation is that, later pointwise module would be created out of unary --> contiguous --> reshape. Codegen for this pointwise module can write out buffer such that explicit contiguous and reshape wouldn't be required.

This transformation is not always guaranteed to improve performance, since unary op will operate on non-standard shape. So, we would need some tuning mechanism to make decision.

#905 pending PR for binary operations.

cf0b6d6d

Roialign gpu impl (#972) · 912c8d22

Shucai Xiao authored Oct 28, 2021

GPU implementation of the roialign operator, using the jit approach to reduce the lib size.

912c8d22

Change to read the docs theme (#990) · 6df1e02b

kahmed10 authored Oct 27, 2021

Updates the theme of our documentation so that it matches the rest of the ROCm libraries.

6df1e02b

20 Oct, 2021 1 commit

Roialign (#952) · d7653732

Shucai Xiao authored Oct 20, 2021

Implementation of the roialign operator. For now, we have only the ref implementation. When we run a model on the GPU, we fall back the execution to use the ref implementation.

d7653732

19 Oct, 2021 2 commits
- Link with pthreads in core migraphx library since we use threads there (#975) · 4d82d761
  Paul Fultz II authored Oct 19, 2021
```
pthread linking errors on SLES. 
```
  4d82d761
- Fusion of pointwise operators (#969) · 351007d4
  Paul Fultz II authored Oct 19, 2021
```
Adds a pass to fuse pointwise operators into one "pointwsie" op that has a submodule which does the calculation.
```
  351007d4
18 Oct, 2021 2 commits

Allow constructing an operation with a format string (#976) · 77164f3c

Paul Fultz II authored Oct 18, 2021

Designed to allow a user to format the values needed for the json_string: migraphx::operation("reduce_mean", "{axes : [%i, %i, %i, %i]}", axes[0], axes[1], axes[2], axes[3]) instead of needing to use string concat or stringstream

77164f3c

Remove redundant cast (#982) · a05113aa
Paul Fultz II authored Oct 18, 2021
```
Enable a cppcheck rule to catch these redundant casts in the future
```
a05113aa

15 Oct, 2021 1 commit

Enabling rocTX markers for migraphx-driver via roctx knob (#946) · 4a71ec8c

Cagri authored Oct 14, 2021



Added features:
This enables wrapping each migraphx operator with rocTX markers.
It adds new knob trace to migraphx-driver binary.

Limitation:

rocTX standalone does not output a file, it needs to be used with rocprof. Example command line:

/opt/rocm/bin/rocprof -i ./in.txt --hip-trace --roctx-trace --flush-rate 10ms --timestamp on -d cagri_out --obj-tracking on /opt/rocm/bin/migraphx-driver trace ./resnet50-v2-7.onnx --onnx --gpu
Co-authored-by: Shucai Xiao <shucai@gmail.com>

4a71ec8c

14 Oct, 2021 1 commit

SpaceToDepth operator (#979) · 6c02cd21

Umang Yadav authored Oct 14, 2021



Inverse of DepthToSpace op
Co-authored-by: Shucai Xiao <shucai@gmail.com>

6c02cd21

13 Oct, 2021 3 commits
- Add rules for RedundantCast · 8829d6ab
  Paul authored Oct 13, 2021
  
  8829d6ab
- Trace eval segfault (#974) · 337c5ba1
  Shucai Xiao authored Oct 13, 2021
```
 when running a model on GPU, migraphx tries to print out content from gpu memory, which causes a segfault. The solution is to copy the gpu memory content back to CPU before the print.
```
  337c5ba1
- Bump version for ABI change (#970) · a14a4e64
  Paul Fultz II authored Oct 13, 2021
  
  a14a4e64