1. 17 Jun, 2022 2 commits
    • Paul's avatar
      Remove failed test · 555d07a2
      Paul authored
      555d07a2
    • kahmed10's avatar
      Create allocate op and replace_allocate pass (#1183) · add6fb3b
      kahmed10 authored
      
      
      * add allocate op header
      
      * formatting
      
      * add replace_allocate pass
      
      * formatting
      
      * move output param to remove_allocate pass
      
      * formatting
      
      * fix bugs in replace_allocate pass
      
      * formatting
      
      * fix verify if tests
      
      * formatting
      
      * move if op logic
      
      * formatting
      
      * cleanup lowering
      
      * cleanup lowering
      
      * formatting
      
      * fix tidy
      
      * formatting
      
      * fix tidy
      
      * add cpu allocate check
      
      * formatting
      
      * change cpu allocate in pass
      
      * formatting
      
      * add some tests for replace_allocate pass
      
      * formatting
      
      * pass by ref
      
      * fix run_pass
      
      * formatting
      
      * update variable name for module
      
      * update dce to use contains() and fix tidy
      
      * formatting
      
      * update cppcheck
      
      * add if test
      
      * formatting
      
      * add if test
      
      * rename var to mod_output_names
      
      * formatting
      
      * remove conditional
      
      * update allocate op and tests
      
      * formatting
      
      * update replace_allocate tests
      
      * update create_output_names() and conditional in replace_allocate
      
      * formatting
      
      * remove extra variable in replace_allocate
      
      * update tools script for allocation_model
      Co-authored-by: default avatarUmang Yadav <29876643+umangyadav@users.noreply.github.com>
      Co-authored-by: default avatarChris Austen <causten@users.noreply.github.com>
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      add6fb3b
  2. 13 Jun, 2022 1 commit
  3. 09 Jun, 2022 1 commit
  4. 25 May, 2022 3 commits
  5. 24 May, 2022 2 commits
  6. 18 May, 2022 3 commits
  7. 06 May, 2022 2 commits
  8. 29 Mar, 2022 1 commit
    • Paul Fultz II's avatar
      Refactor runtime compiled kernels to use the same compile_ops pipeline (#1125) · 661046c6
      Paul Fultz II authored
      This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape.
      
      This also makes it easier to add new runtime compiled kernels in the future.
      661046c6
  9. 25 Feb, 2022 1 commit
  10. 09 Feb, 2022 1 commit
  11. 26 Jan, 2022 1 commit
    • Paul's avatar
      Updates · 1cc6c88c
      Paul authored
      1cc6c88c
  12. 10 Jan, 2022 1 commit
  13. 11 Dec, 2021 3 commits
  14. 01 Dec, 2021 3 commits
  15. 24 Nov, 2021 2 commits
  16. 17 Nov, 2021 1 commit
    • Paul Fultz II's avatar
      Handle removing contiguous on operators that use modules (#1005) · 785307c3
      Paul Fultz II authored
      Currently, eliminate_contiguous will never remove contiguous for operators that use module inputs due to the fact that it doesn't pass the module inputs to compute_shape.
      
      - Update to pass the module inputs correctly to compute_shape
      - Fix the overloads of compute_shape so that when passed an empty vector of module inputs it will call the overload without module inputs
      - Add tests with contiguous and pointwise module function.
      - Move add_pointwise function to a seperate header to reuse across different tests
      785307c3
  17. 16 Nov, 2021 2 commits
  18. 09 Nov, 2021 2 commits
  19. 08 Oct, 2021 1 commit
    • Umang Yadav's avatar
      Remove alpha and beta from `dot` and `quant_dot` (#961) · 21193e87
      Umang Yadav authored
      Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs.
      
      Aim is to have the definition of dot operator as C = A . B without having alpha or beta.
      
      In order to achieve the same effect as alpha and beta (1) it multiplies the one of the inputs to the dot operator with alpha value. (2) if beta is present then, multiplies the C with beta and then adds into the output from step 1.
      21193e87
  20. 07 Sep, 2021 1 commit
    • Shucai Xiao's avatar
      qdq for quantization and include subgraph (#891) · b45f7239
      Shucai Xiao authored
      
      
      Add operators, refactor parsers, add rewrite passes, add tests
      Add ref implementations
      Move broadcasting of scales and zero points to onnx parser
      Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type
      fp16 and fp8 quantization to include subgraph and parameters
      fix unit test to use qdq operators for int8 quantization
      Co-authored-by: default avatarturneram <alturner@amd.com>
      b45f7239
  21. 31 Aug, 2021 1 commit
    • Shucai Xiao's avatar
      Fix debug assert (#930) · bd85a76c
      Shucai Xiao authored
      * fix two asserts for debug build
      
      * add unit test for copy parameters
      
      * clang format
      
      * add a unit test for reorder_dims
      
      * change tranpose to always require perm not be empty
      
      * clang format
      
      * remove an unnecessary line
      
      * fix tidy error
      
      * fix review comments
      bd85a76c
  22. 24 Aug, 2021 1 commit
    • Umang Yadav's avatar
      Change attributes names to be more consistent and reflect better meaning (#916) · 0d2606bb
      Umang Yadav authored
      * rename broadcast and multibroadcast output_lens attribute to out_lens attribute, and change tests and source code to reflect the same
      
      * change the reshape attribute from dims to out_lens
      
      * change transpose attribute's name from dims to perm to reflect better meaning
      
      * use permutation instead of perm for transpose
      
      clang formaating
      
      * use dims instead of out_lens for reshape
      
      clang formatting
      0d2606bb
  23. 19 Aug, 2021 1 commit
  24. 10 Aug, 2021 1 commit
    • Paul Fultz II's avatar
      Add option to compile with hiprtc (#892) · 91c9ebbc
      Paul Fultz II authored
      * Add hiprtc compile option
      * Add cross compile test
      * Update error reporting
      * Add tests for errors and warnings
      * Fix tidy warning
      * Add comment to ifdefs
      * Skip null character at end of log
      * Assert there is null at the end
      91c9ebbc
  25. 05 Aug, 2021 1 commit
    • Paul Fultz II's avatar
      Add gpu driver and improvements to pointwise codegen (#851) · 29fa2666
      Paul Fultz II authored
      
      
      * Add method to compile pointwise
      
      * Formatting
      
      * Add lambda
      
      * Add semicolon
      
      * Rename variable
      
      * Add driver to run jit kernels
      
      * Formatting
      
      * Add context
      
      * Formatting
      
      * Make seperate driver folder
      
      * Add more general gpu driver
      
      * Formatting
      
      * Print out wll time
      
      * Formatting
      
      * Run multiple times and skip first run
      
      * Formatting
      
      * Seperate time_op
      
      * Run an op for comparison
      
      * Formatting
      
      * Add debug asserts
      
      * Formatting
      
      * Change parameer name
      
      * Formatting
      
      * Fix argument order
      
      * Formatting
      
      * Add preloading
      
      * Formatting
      
      * Allow a different data type
      
      * Formatting
      
      * Pipeline transformations
      
      * Formatting
      
      * Add vectorization
      
      * Formatting
      
      * Reduce dims
      
      * Formatting
      
      * Compile with launch params as constant
      
      * Formatting
      
      * Make sure buffer can be vecotrized
      
      * Formatting
      
      * Enable vectorization and preloading
      
      * Formatting
      
      * Add print header
      
      * Formatting
      
      * Avoid allocating to large of LDS
      
      * Formatting
      
      * Add some vec functions to a seperate header
      
      * Formatting
      
      * Add stride loops
      
      * Formatting
      
      * Improve the transform pipeline
      
      * Formatting
      
      * Add const
      
      * Fix shape check
      
      * Formatting
      
      * Just check stride axis is zero
      
      * Remove extra finc_vector_axis overload
      
      * Simplify some mroe functions
      
      * Formatting
      
      * Remove some more extra functions
      
      * Formatting
      
      * Simplify more decltypes
      
      * Add another const
      
      * Fix test
      
      * Get buffer pointer different for older compilers
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      Co-authored-by: default avatarChris Austen <causten@users.noreply.github.com>
      29fa2666
  26. 14 Jul, 2021 1 commit