Commits · d00fdf6ead711bc03e9f588772ef62ef8c384380 · gaoqiong / MIGraphX

06 Dec, 2023 1 commit
- Add fp8 test · d00fdf6e
  Paul authored Dec 05, 2023
  
  d00fdf6e
01 Dec, 2023 1 commit
- FP8 GPU implementation (#2455) · eafd55de
  Umang Yadav authored Dec 01, 2023
  
  eafd55de
11 Nov, 2023 1 commit
- Fix swizzle for navi · 9ab38607
  Paul authored Nov 11, 2023
  
  9ab38607
22 Jun, 2022 1 commit
- Update license files (#1248) · e44cecbc
  Ted Themistokleous authored Jun 22, 2022
```
Updated each source file in the repo with the existing license.
```
  e44cecbc
17 Apr, 2022 1 commit

Reduce with runtime compilation (#1150) · f9a5b81e

Paul Fultz II authored Apr 17, 2022

There is significant improvement on larger tensors with half almost 50% faster:

lens: [1024, 384, 768]
gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms
gpu::reduce_sum[axes={2}]: 1.73126ms
Also for non-trivial layouts this can sometimes be over 2x faster:

lens: [64, 1024, 768, 4]
gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms
gpu::reduce_sum[axes={1}]: 2.63375ms
Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR.

Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.

f9a5b81e

09 Dec, 2020 1 commit
- Reduce operators.hpp includes · b10c76e6
  Paul authored Dec 09, 2020
  
  b10c76e6
11 Nov, 2020 1 commit

Refactor program to module (#684) · 2466dd6f

Shucai Xiao authored Nov 11, 2020



* code backup

* clang format

* change corresponding tool files

* clang format
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

2466dd6f

04 Nov, 2020 1 commit

Split cpu and reference implementation (#671) · 500d9441

Paul Fultz II authored Nov 04, 2020



* Add all_targets cmake target

* Rename target

* Add ref target

* Rename tests

* Refactor compiler target

* Formatting

* Verify for every target

* Formatting

* Add verify test suite

* Formatting

* Add initial test programs

* Formatting

* Add rnn tests

* Formatting

* Validate gpu

* Formatting

* Remove old gpu tests

* Fix gpu tests

* Fix ref error

* Fix tidy issues

* Formatting

* Tidy fixes

* Fix header in python api

* Rename to ref

* Use ref in verify_onnx

* Fix tidy issue

* Build with verbose on

* Fix typo

* Remove verbose

* rename some cpu prefix to ref
Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>

500d9441