"src/targets/gpu/vscode:/vscode.git/clone" did not exist on "4a10535cbfbda2ad394f16d23ba31d6e535c3dc5"
  1. 06 Dec, 2023 1 commit
  2. 01 Dec, 2023 1 commit
  3. 22 Jun, 2022 1 commit
  4. 17 Apr, 2022 1 commit
    • Paul Fultz II's avatar
      Reduce with runtime compilation (#1150) · f9a5b81e
      Paul Fultz II authored
      There is significant improvement on larger tensors with half almost 50% faster:
      
      lens: [1024, 384, 768]
      gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms
      gpu::reduce_sum[axes={2}]: 1.73126ms
      Also for non-trivial layouts this can sometimes be over 2x faster:
      
      lens: [64, 1024, 768, 4]
      gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms
      gpu::reduce_sum[axes={1}]: 2.63375ms
      Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR.
      
      Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.
      f9a5b81e
  5. 09 Dec, 2020 1 commit
  6. 11 Nov, 2020 1 commit
  7. 04 Nov, 2020 1 commit
    • Paul Fultz II's avatar
      Split cpu and reference implementation (#671) · 500d9441
      Paul Fultz II authored
      
      
      * Add all_targets cmake target
      
      * Rename target
      
      * Add ref target
      
      * Rename tests
      
      * Refactor compiler target
      
      * Formatting
      
      * Verify for every target
      
      * Formatting
      
      * Add verify test suite
      
      * Formatting
      
      * Add initial test programs
      
      * Formatting
      
      * Add rnn tests
      
      * Formatting
      
      * Validate gpu
      
      * Formatting
      
      * Remove old gpu tests
      
      * Fix gpu tests
      
      * Fix ref error
      
      * Fix tidy issues
      
      * Formatting
      
      * Tidy fixes
      
      * Fix header in python api
      
      * Rename to ref
      
      * Use ref in verify_onnx
      
      * Fix tidy issue
      
      * Build with verbose on
      
      * Fix typo
      
      * Remove verbose
      
      * rename some cpu prefix to ref
      Co-authored-by: default avatarShucai Xiao <Shucai.Xiao@amd.com>
      500d9441