Commits · d562e26515bd6a052fd7d32f3ec1f5e26cd70695 · gaoqiong / MIGraphX

12 Oct, 2022 2 commits
- Format · d562e265
  Paul authored Oct 12, 2022
  
  d562e265
- Some more simplifications · c51b3d29
  Paul authored Oct 12, 2022
  
  c51b3d29
10 Oct, 2022 7 commits
- Add make_const_array · f8e5a547
  Paul authored Oct 10, 2022
  
  f8e5a547
- Format · 746291a8
  Paul authored Oct 10, 2022
  
  746291a8
- ONly device time for code objects · 12a82ff9
  Paul authored Oct 10, 2022
  
  12a82ff9
- Format · 25f0a80f
  Paul authored Oct 10, 2022
  
  25f0a80f
- Allow putting inputs in the settings · 18273046
  Paul authored Oct 10, 2022
  
  18273046
- Format · d11ec237
  Paul authored Oct 10, 2022
  
  d11ec237
- Seperate host/device time · 6568a69d
  Paul authored Oct 10, 2022
  
  6568a69d
09 Oct, 2022 6 commits
- Fix global size · 9db9f4e0
  Paul authored Oct 09, 2022
  
  9db9f4e0
- Add some warning about missing configs · 398d45a4
  Paul authored Oct 09, 2022
  
  398d45a4
- Format · ff878ce6
  Paul authored Oct 09, 2022
  
  ff878ce6
- Load a tuning json file · 93a5de9f
  Paul authored Oct 09, 2022
  
  93a5de9f
- Format · 7bbb1cde
  Paul authored Oct 08, 2022
  
  7bbb1cde
- Move to cpp file · b57f58e1
  Paul authored Oct 08, 2022
  
  b57f58e1
08 Oct, 2022 2 commits
- Format · 873f6c0c
  Paul authored Oct 08, 2022
  
  873f6c0c
- Handle transposes and data types · 6ad2af4e
  Paul authored Oct 08, 2022
  
  6ad2af4e
07 Oct, 2022 4 commits
- Use get · c343c534
  Paul authored Oct 07, 2022
  
  c343c534
- Disable format · 19a570a0
  Paul authored Oct 07, 2022
  
  19a570a0
- Format · 969be85c
  Paul authored Oct 07, 2022
  
  969be85c
- Add ck_gemm · bfad5a5b
  Paul authored Oct 07, 2022
  
  bfad5a5b
04 Oct, 2022 2 commits
- Stream sync Changset (#1358) · f7d987ba
  Ted Themistokleous authored Oct 04, 2022
```
Stream sync changes and associated API level changes
```
  f7d987ba
- Fast softmax (#1290) · a9a47402
  Paul Fultz II authored Oct 04, 2022
```
optimize the softmax operator
```
  a9a47402
03 Oct, 2022 1 commit

Add output_alias and runs_on_offload_target flags for the custom ops (#1309) · c9ffb38d

Umang Yadav authored Oct 03, 2022

Adds two methods for the custom_ops virtual class.

bool runs_on_offload_target(), if the custom op runs directly on the gpu then it should be set to true. in this case, custom op expects its parameters to reside in GPU memory and writes output to the GPU memory. If it is set to false then, custom op expects it's parameter to reside on the host and puts back the result into the host memory.

output_alias, if output of the custom op is aliasing the input buffer. i.e. interpreting the same input buffer with differnet shape and strides.

Update as_vector() in C++ API to handle non-standard shapes. It required exposing element_index to space_index conversion method for the shape class.

c9ffb38d

29 Sep, 2022 1 commit

Use find_2.0 API for the convolution (#1346) · e19f78ae

Umang Yadav authored Sep 29, 2022

Improvements/Additions to be made:

changes for the quant_convolution,
changes for the deconvolution,
Macros for MIOpen status checks

e19f78ae

28 Sep, 2022 1 commit

Add compute_fp32 flag for quant_gemm tests (#1360) · 70e63960

Umang Yadav authored Sep 28, 2022

test_gpu_pack_int8_args fails on gfx908 machine, because it doesn't set compute_fp32 flag correctly. This PR fixes the test such that it checks for the device-name, and rocblas-versions and sets this flag accordingly.

70e63960

27 Sep, 2022 1 commit
- Add onnx mod operator gpu cpu (#1306) · 40118191
  Ted Themistokleous authored Sep 26, 2022
```
Implement operator for CPU and GPU implementations
```
  40118191
26 Sep, 2022 1 commit
- Use larger vector size instead of preloading for broadcasted inputs (#1389) · 492c4a6c
  Paul Fultz II authored Sep 26, 2022
  
  492c4a6c
23 Sep, 2022 1 commit
- Remove unused device functions (#1394) · 8ea8473d
  Paul Fultz II authored Sep 23, 2022
```
* Remove device functions
* Update tests
```
  8ea8473d
21 Sep, 2022 1 commit

Parameterize epsilon for layernorm kernel (#1367) · d9578ba6

kahmed10 authored Sep 21, 2022

This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.

d9578ba6

19 Sep, 2022 1 commit

Improve layernorm and reductions performance (#1348) · 97a1ed2d

Paul Fultz II authored Sep 19, 2022

Compute mean and variance in same reduction
Set block size to numbers divisible by 32 instead powers of 2
Global is also set exactly instead of being divisible by block size
More exact matching of global/local can help get rid of branching/loops
Reduce vectors first before doing dpp_reduce
Explicitly vectorize array operators since the compiler doesnt always vectorize them
Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported

97a1ed2d

16 Sep, 2022 1 commit
- Fix typo for add_sigmoid (#1385) · 10f37f49
  Umang Yadav authored Sep 16, 2022
```
* fix typo for add_sigmoid
```
  10f37f49
15 Sep, 2022 1 commit

[mlir] Replaced `find_library` with `find_package` to locate MLIR static library (#1373) · e1e36cdc

Lixun Zhang authored Sep 15, 2022

* Replaced `find_library` with `find_package` to locate MLIR static library
* Unified the include dir for headers and remove backward compatibility
* Embedded the external/include dir into the exported library

e1e36cdc

14 Sep, 2022 1 commit
- Implement concat using jit compilation (#1356) · 7662d9c0
  Paul Fultz II authored Sep 14, 2022
```
* Implement concat using jit compilation
```
  7662d9c0
13 Sep, 2022 1 commit

Use rocblas_gemm_ex for batched gemms with broadcasted B (#1354) · a10a8ef1

turneram authored Sep 13, 2022

Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension.
The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes.

Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.

a10a8ef1

08 Sep, 2022 1 commit
- Remove unused headers (#1363) · ed2c73ac
  Paul Fultz II authored Sep 08, 2022
```
* Remove unused headers
```
  ed2c73ac
07 Sep, 2022 1 commit
- Fix accuracy bug when vectorizing slices (#1364) · 60aa0e48
  Paul Fultz II authored Sep 06, 2022
```
* Fix accuracy bug when vectorizing slices
```
  60aa0e48
06 Sep, 2022 1 commit
- Enable cppcheck rule for 'not', 'or' keywords (#1361) · d37a4df9
  Paul Fultz II authored Sep 06, 2022
```
Using not and or improves readability. The cppcheck rule will help ensure we are doing it consistently.
```
  d37a4df9
31 Aug, 2022 1 commit

Add pass to rewrite gelu as fast gelu (#1299) · 794a4335

turneram authored Aug 31, 2022

Rewrite_gelu pass replaces the gelu formula of x * (1/2) * (1 + erf(x/sqrt(2))) with the sigmoid approximation of x * Sigmoid(x * 1.702)

794a4335

27 Aug, 2022 1 commit
- Show kernel time when using gpu-driver (#1289) · 349635ce
  Paul Fultz II authored Aug 27, 2022
```
* Track kernel time
```
  349635ce