1. 28 Sep, 2023 1 commit
    • Umang Yadav's avatar
      Add options to set tolerances inside MIGraphX driver (#2213) · 69d8d789
      Umang Yadav authored
      MIGraphX verification by default uses normalized RMS error as the basis for the verification.  This change adds some logic to allow migraphx to do "np.allclose" type of elementwise verification using atol and rtol.
      
      Commit also includes changes to consistently pass "gold" or "expected" results as the second argument for "verify_range()" calls.  Default RMS tolerance inside driver is set to 0.001 which IMO is high for FP32 compared to what we had earlier. Need better defaults
      69d8d789
  2. 27 Sep, 2023 1 commit
  3. 13 Sep, 2023 1 commit
  4. 18 Aug, 2023 1 commit
  5. 10 Aug, 2023 1 commit
  6. 06 Aug, 2023 1 commit
  7. 30 Jul, 2023 1 commit
    • Paul Fultz II's avatar
      Enable tuning for MLIR (#1965) · be6ecff6
      Paul Fultz II authored
      * Add initial tuning support
      
      * Format
      
      * Add extra param
      
      * Format
      
      * Use exauhstive flag
      
      * Format
      
      * Set expected shapes
      
      * Format
      
      * Format
      
      * Fix missing symbol
      
      * Format
      
      * Add missing license header
      
      * Format
      
      * Update src/targets/gpu/include/migraphx/gpu/mlir.hpp
      be6ecff6
  8. 16 Jul, 2023 1 commit
  9. 05 Jul, 2023 1 commit
  10. 29 Jun, 2023 1 commit
  11. 22 Jun, 2023 1 commit
  12. 19 May, 2023 1 commit
  13. 17 May, 2023 1 commit
  14. 04 May, 2023 1 commit
    • Zhuoran Yin's avatar
      [mlir] Adding quant convolution fusion as anchor op (#1683) · 7f105952
      Zhuoran Yin authored
      Exposed the mlir_enabled() call the decide for lowering pipeline's enablement
      Disabled the rewrite quantization pipeline in mlir compilation
      Added quant convolution as anchor ops
      Fixed the return type expectations
      Added the fall back hip implementation for quantizelinear and dequantizelinear
      Will need advises to improve the implementation for quantizelinear
      7f105952
  15. 13 Apr, 2023 1 commit
  16. 06 Apr, 2023 1 commit
    • Charlie Lin's avatar
      Driver dynamic batch update (#1652) · adccec52
      Charlie Lin authored
      Examples..
      
      bin/driver verify /codes/onnx_models/resnet50-v1-7/resnet50-v1-7.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim @data "[{min:1, max:4}, 3, 224, 224]"
      
      bin/driver compile /codes/onnx_models/resnet50-v1-7/resnet50-v1-7.onnx --split-single-dyn-dim --default-dyn-dim "{min:1, max:10}" --output resnet50_batch1-10.mxr
      
      bin/driver perf resnet50_batch1-10.mxr --batch 4
      adccec52
  17. 27 Mar, 2023 1 commit
  18. 18 Mar, 2023 1 commit
  19. 31 Jan, 2023 1 commit
    • Umang Yadav's avatar
      hipRTC fixes (#1531) · 91cc7242
      Umang Yadav authored
      Added CMakeFlag for hipRTC. MIGRAPHX_USE_HIPRTC.
      Added stages in Jenkins for hipRTC.
      Fixes for some of the pending issues from hipRTC.
      91cc7242
  20. 06 Dec, 2022 2 commits
  21. 27 Oct, 2022 1 commit
  22. 18 Oct, 2022 1 commit
  23. 13 Oct, 2022 1 commit
    • Charlie Lin's avatar
      Refactor dynamic padding mode (#1387) · 32f6388c
      Charlie Lin authored
      Removes use_dynamic_same_auto_pad
      Change padding_mode to be used for dynamic padding
      Move compute_padded_shape to pad_calc.cpp as it will be used in other dynamic padding cases
      Fix same_lower compute_padded_shape bug and add a test.
      32f6388c
  24. 04 Oct, 2022 1 commit
  25. 29 Sep, 2022 1 commit
  26. 28 Sep, 2022 1 commit
    • Umang Yadav's avatar
      Add compute_fp32 flag for quant_gemm tests (#1360) · 70e63960
      Umang Yadav authored
      test_gpu_pack_int8_args fails on gfx908 machine, because it doesn't set compute_fp32 flag correctly. This PR fixes the test such that it checks for the device-name, and rocblas-versions and sets this flag accordingly.
      70e63960
  27. 27 Sep, 2022 1 commit
  28. 23 Sep, 2022 1 commit
  29. 16 Sep, 2022 1 commit
  30. 15 Sep, 2022 1 commit
  31. 04 Aug, 2022 1 commit
    • Charlie Lin's avatar
      Dynamic ref convolution op (#1224) · 67f77ac1
      Charlie Lin authored
      
      
      * Dynamic shape handling in shape object
      
      * rewrite empty lens multibroadcast test
      
      * Shape class changes to handle dynamic
      * More throw errors for functions that don't make sense for dynamic shape
      * Print output changes
      * Serialization changes
      
      * Fixing serialization errors
      
      * Remove const on dyn_dim copy getters
      
      * Dynamic shape tests
      
      * Fix serialize errors
      
      * Add dyn_data struct to avoid ambiguous constructor
      
      * Tidy fix: emplace_back() over for loop
      
      * Tidy fix: use move
      
      * Use std::initializer_list in constructor
      Reverts the dyn_data struct change
      Should get around the ambiguous braced initialization list error
      
      * avoid typedef
      
      * element_space, min,max,opt _lens change
      
      * formatting
      
      * Comments fix
      
      * dynamic bytes() test
      
      * Seralize and reflect changes
      
      * formatting
      
      * Test the dynamic lens functions
      
      * progress
      
      * Formatting
      
      * Dynamic conv draft progress
      
      * Add operator<< tests for coverage
      
      * Coverage update
      
      * Add to conv dynamic batch test
      
      * Dynamic image size test
      
      * Dynamic weight handling
      
      * Dyn image shape test change, fix dyn weight cond
      
      * Comment update
      
      * Dynamic weights shape test and fix
      
      * Use ternary operator
      
      * Tidy fixes
      
      * Handle dynamic graph input shapes in ONNX parser
      
      * Formatting
      
      * Handle dynamic shape for convolution
      
      * formatting
      
      * cppcheck fixes
      
      * Add onnx test files
      
      * Fix typo
      
      * Disable auto_pad for dynamic input shape
      
      * check_shapes object checks for allowing dynamic shapes
      
      * Fix any_of
      
      * Change to maintain const objectness
      
      * Formatting
      
      * Check shapes allow dynamic
      
      * Refactor compute_shape() call into op.compute()
      Allows for per operator differences with handling dynamic shape
      Fix operation.hpp change to use the generator
      
      * Comment fix
      
      * Refactor normalize_attributes() calls to use max_lens()
      
      * Comment addition
      
      * Update other normalize_attributes() calls
      
      * Change to using constructor and add tests
      
      * Use const member function
      
      * Add more dynamic shape support
      
      * Add tests for error code coverage
      
      * Fix opt shape bug and add shape tests
      
      * capture all by ref
      
      * Fix typo with img shape calculation
      
      * Add more tests
      
      * dynamic auto pad attempt
      Linker error with pad_calc.cpp
      
      * Fix parse dyn auto_pad
      Should only need to use dynamic auto pad when the image shape or kernel
      shape are dynamic. For a dynamic batch size, the auto pad calculation is
      the same.
      
      * Fix linking error
      
      * Fix auto_pad bug
      Fixed input tensor with auto_pad setting on
      
      * auto_pad onnx tests
      
      * Fix auto_pad calculation, evaluate in ref_conv
      add ref_ops tests
      
      * Add shape tests, fix bugs
      
      * Refactor first two output dynamic len calculation
      
      * Conv MLIR test update
      
      * i64 MLIR test fix
      
      * Fix MLIR test typo
      Co-authored-by: default avatarChris Austen <causten@users.noreply.github.com>
      67f77ac1
  32. 29 Jul, 2022 1 commit
    • Umang Yadav's avatar
      Avoid registering host buffer ptr multiple times during hip copies (#1245) · 7596f3f1
      Umang Yadav authored
      Currently, while copying a host buffer to the device, it first registers/maps the host buffer pointer to address space of the device.
      
      If the host buffer has been allocated by the hipHostMalloc then, it is implicitly registered to the device's address space, and no need to register again. This PR adds a check for the same.
      7596f3f1
  33. 03 Jul, 2022 1 commit
    • Paul Fultz II's avatar
      Add mlir fusion (#1251) · ca8a54fe
      Paul Fultz II authored
      * Add mlir c api
      
      * Formatting
      
      * Create a type attribute
      
      * Formatting
      
      * Parse module
      
      * Formatting
      
      * Add mlir dump function
      
      * Add test case
      
      * Formatting
      
      * Fix tidy issues
      
      * Update mlit version
      
      * Update to newer mlir
      
      * Format
      
      * Move mlir to the gpu and update the test
      
      * Formatting
      
      * Fix bug when appending module
      
      * Format
      
      * Remove old cmake flag
      
      * Update message
      
      * Add return
      
      * Format
      
      * Add mlir_compile
      
      * Format
      
      * Register dialect
      
      * Handle unsinged integers
      
      * Dont provide output for return instruction
      
      * Format
      
      * Add code to insert memrefs
      
      * Format
      
      * Add mlir verification
      
      * Formatting
      
      * Enable pointwise_fusion
      
      * Disable eliminate_data_type
      
      * Set kernal name
      
      * Format
      
      * Fix device name
      
      * Formatting
      
      * Fix output arg
      
      * Format
      
      * Updates
      
      * Upate hash
      
      * Add fuse_mlir pass
      
      * Format
      
      * Add fuse mlir
      
      * Format
      
      * Update mlir
      
      * Sort parameter names
      
      * Format
      
      * Reenable disabled passes
      
      * Remove old mlir conv
      
      * Remove asym default padding
      
      * Add more verbose tracing
      
      * Format
      
      * Fix compilation errors
      
      * Format
      
      * Whitelist operators
      
      * Format
      
      * Add namespace
      
      * Format
      
      * Update triple
      
      * Format
      
      * Use func dialect
      
      * Format
      
      * Use func.return
      
      * Format
      
      * Upgrade mlir version
      
      * Add comment
      
      * Handle symetrical padding
      
      * Format
      
      * Cleanup debug output
      
      * Format
      
      * List failed tests
      
      * Move mlir compile to jit pipeline
      
      * Format
      
      * Update version
      
      * Add source locations
      
      * Format
      
      * Correctly add module
      
      * Format
      
      * Update failed tests
      
      * Fix failures when mlir is disabled
      
      * Format
      
      * Update mlir version
      
      * Check type for fp32
      
      * Format
      
      * Remove failed test
      
      * Update mlir in driver
      
      * Tidy fixes
      
      * Foramt
      
      * Tidy fixes
      
      * Format
      
      * Fix const
      
      * Remove from requirements
      
      * Fix cmake version
      
      * Fix tidy warning
      
      * Use another ifdef
      
      * Fix tidy
      
      * Other tidy fix
      
      * Format
      
      * Update hash
      
      * Add missing license files
      
      * Format
      
      * Format
      
      * Fix fnction name
      ca8a54fe
  34. 22 Jun, 2022 1 commit
  35. 17 Jun, 2022 1 commit
    • kahmed10's avatar
      Create allocate op and replace_allocate pass (#1183) · add6fb3b
      kahmed10 authored
      
      
      * add allocate op header
      
      * formatting
      
      * add replace_allocate pass
      
      * formatting
      
      * move output param to remove_allocate pass
      
      * formatting
      
      * fix bugs in replace_allocate pass
      
      * formatting
      
      * fix verify if tests
      
      * formatting
      
      * move if op logic
      
      * formatting
      
      * cleanup lowering
      
      * cleanup lowering
      
      * formatting
      
      * fix tidy
      
      * formatting
      
      * fix tidy
      
      * add cpu allocate check
      
      * formatting
      
      * change cpu allocate in pass
      
      * formatting
      
      * add some tests for replace_allocate pass
      
      * formatting
      
      * pass by ref
      
      * fix run_pass
      
      * formatting
      
      * update variable name for module
      
      * update dce to use contains() and fix tidy
      
      * formatting
      
      * update cppcheck
      
      * add if test
      
      * formatting
      
      * add if test
      
      * rename var to mod_output_names
      
      * formatting
      
      * remove conditional
      
      * update allocate op and tests
      
      * formatting
      
      * update replace_allocate tests
      
      * update create_output_names() and conditional in replace_allocate
      
      * formatting
      
      * remove extra variable in replace_allocate
      
      * update tools script for allocation_model
      Co-authored-by: default avatarUmang Yadav <29876643+umangyadav@users.noreply.github.com>
      Co-authored-by: default avatarChris Austen <causten@users.noreply.github.com>
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      add6fb3b
  36. 06 May, 2022 1 commit
  37. 29 Mar, 2022 1 commit
    • Paul Fultz II's avatar
      Refactor runtime compiled kernels to use the same compile_ops pipeline (#1125) · 661046c6
      Paul Fultz II authored
      This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape.
      
      This also makes it easier to add new runtime compiled kernels in the future.
      661046c6
  38. 25 Feb, 2022 1 commit
  39. 09 Feb, 2022 1 commit