1. 17 Nov, 2021 1 commit
    • Paul Fultz II's avatar
      Handle removing contiguous on operators that use modules (#1005) · 785307c3
      Paul Fultz II authored
      Currently, eliminate_contiguous will never remove contiguous for operators that use module inputs due to the fact that it doesn't pass the module inputs to compute_shape.
      
      - Update to pass the module inputs correctly to compute_shape
      - Fix the overloads of compute_shape so that when passed an empty vector of module inputs it will call the overload without module inputs
      - Add tests with contiguous and pointwise module function.
      - Move add_pointwise function to a seperate header to reuse across different tests
      785307c3
  2. 08 Oct, 2021 1 commit
    • Umang Yadav's avatar
      Remove alpha and beta from `dot` and `quant_dot` (#961) · 21193e87
      Umang Yadav authored
      Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs.
      
      Aim is to have the definition of dot operator as C = A . B without having alpha or beta.
      
      In order to achieve the same effect as alpha and beta (1) it multiplies the one of the inputs to the dot operator with alpha value. (2) if beta is present then, multiplies the C with beta and then adds into the output from step 1.
      21193e87
  3. 07 Sep, 2021 1 commit
    • Shucai Xiao's avatar
      qdq for quantization and include subgraph (#891) · b45f7239
      Shucai Xiao authored
      
      
      Add operators, refactor parsers, add rewrite passes, add tests
      Add ref implementations
      Move broadcasting of scales and zero points to onnx parser
      Allow for x and zero_point to have different types in quantizelinear; fix zero_point default type
      fp16 and fp8 quantization to include subgraph and parameters
      fix unit test to use qdq operators for int8 quantization
      Co-authored-by: default avatarturneram <alturner@amd.com>
      b45f7239
  4. 31 Aug, 2021 1 commit
    • Shucai Xiao's avatar
      Fix debug assert (#930) · bd85a76c
      Shucai Xiao authored
      * fix two asserts for debug build
      
      * add unit test for copy parameters
      
      * clang format
      
      * add a unit test for reorder_dims
      
      * change tranpose to always require perm not be empty
      
      * clang format
      
      * remove an unnecessary line
      
      * fix tidy error
      
      * fix review comments
      bd85a76c
  5. 24 Aug, 2021 1 commit
    • Umang Yadav's avatar
      Change attributes names to be more consistent and reflect better meaning (#916) · 0d2606bb
      Umang Yadav authored
      * rename broadcast and multibroadcast output_lens attribute to out_lens attribute, and change tests and source code to reflect the same
      
      * change the reshape attribute from dims to out_lens
      
      * change transpose attribute's name from dims to perm to reflect better meaning
      
      * use permutation instead of perm for transpose
      
      clang formaating
      
      * use dims instead of out_lens for reshape
      
      clang formatting
      0d2606bb
  6. 19 Aug, 2021 1 commit
  7. 10 Aug, 2021 1 commit
    • Paul Fultz II's avatar
      Add option to compile with hiprtc (#892) · 91c9ebbc
      Paul Fultz II authored
      * Add hiprtc compile option
      * Add cross compile test
      * Update error reporting
      * Add tests for errors and warnings
      * Fix tidy warning
      * Add comment to ifdefs
      * Skip null character at end of log
      * Assert there is null at the end
      91c9ebbc
  8. 05 Aug, 2021 1 commit
    • Paul Fultz II's avatar
      Add gpu driver and improvements to pointwise codegen (#851) · 29fa2666
      Paul Fultz II authored
      
      
      * Add method to compile pointwise
      
      * Formatting
      
      * Add lambda
      
      * Add semicolon
      
      * Rename variable
      
      * Add driver to run jit kernels
      
      * Formatting
      
      * Add context
      
      * Formatting
      
      * Make seperate driver folder
      
      * Add more general gpu driver
      
      * Formatting
      
      * Print out wll time
      
      * Formatting
      
      * Run multiple times and skip first run
      
      * Formatting
      
      * Seperate time_op
      
      * Run an op for comparison
      
      * Formatting
      
      * Add debug asserts
      
      * Formatting
      
      * Change parameer name
      
      * Formatting
      
      * Fix argument order
      
      * Formatting
      
      * Add preloading
      
      * Formatting
      
      * Allow a different data type
      
      * Formatting
      
      * Pipeline transformations
      
      * Formatting
      
      * Add vectorization
      
      * Formatting
      
      * Reduce dims
      
      * Formatting
      
      * Compile with launch params as constant
      
      * Formatting
      
      * Make sure buffer can be vecotrized
      
      * Formatting
      
      * Enable vectorization and preloading
      
      * Formatting
      
      * Add print header
      
      * Formatting
      
      * Avoid allocating to large of LDS
      
      * Formatting
      
      * Add some vec functions to a seperate header
      
      * Formatting
      
      * Add stride loops
      
      * Formatting
      
      * Improve the transform pipeline
      
      * Formatting
      
      * Add const
      
      * Fix shape check
      
      * Formatting
      
      * Just check stride axis is zero
      
      * Remove extra finc_vector_axis overload
      
      * Simplify some mroe functions
      
      * Formatting
      
      * Remove some more extra functions
      
      * Formatting
      
      * Simplify more decltypes
      
      * Add another const
      
      * Fix test
      
      * Get buffer pointer different for older compilers
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      Co-authored-by: default avatarChris Austen <causten@users.noreply.github.com>
      29fa2666
  9. 14 Jul, 2021 1 commit
  10. 15 Jun, 2021 1 commit
    • Shucai Xiao's avatar
      Int8 gemm support (#811) · 39bc6161
      Shucai Xiao authored
      
      
      * add a flag to indicate int8x4 input format
      
      * clang format
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * remove log info
      
      * remove unnecessary changes
      
      * fix cppcheck error
      
      * add unit tests to have more code coverage
      
      * clang format
      
      * add debug info
      
      * remove log info
      
      * fix cppcheck error
      
      * clang format
      
      * clang format
      
      * add one more unit tests for more scenarios
      
      * fix cppcheck error
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * rename p to m
      
      * fix review comments
      
      * refine unit tests
      
      * clang format
      
      * refine unit tests and fixed a bug
      
      * clang format
      
      * fix build error related to rocm4.2
      
      * fix a bug related to alpha and beta
      
      * refine two unit tests related to int8_gemm
      
      * fix cppcheck error
      
      * refine unit test to pass on mi100
      
      * add unit test for packing int8 args
      
      * clang format
      
      * change unit tests back
      
      * disable some unit tests for gpu
      
      * clang format
      
      * refine unit tests to run on mi100
      
      * clang format
      
      * refine unit tests
      
      * refine unit tests
      
      * clang format
      
      * change back a unit test
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      39bc6161
  11. 03 May, 2021 1 commit
  12. 22 Apr, 2021 1 commit
    • Paul Fultz II's avatar
      Cpu fusions using post_ops (#781) · f7befe50
      Paul Fultz II authored
      
      
      * Add eliminate_data_type pass
      
      * Formatting
      
      * Auto convert quant ops
      
      * Formatting
      
      * Flip the order of decompose
      
      * Compute max size differently
      
      * Formatting
      
      * Clamp values in convert
      
      * Formatting
      
      * Fix loss of precision in reduce
      
      * Formatting
      
      * Fix bugs in reduction
      
      * Fix accumulator type in reference softmax implementation
      
      * Formatting
      
      * Update convert test
      
      * Remove unused variables
      
      * Remove unnecessary quant_dot check
      
      * Formatting
      
      * Add tests
      
      * Formatting
      
      * Remove unused code
      
      * Remove duplicate ops
      
      * Remove blaze dependency
      
      * Use set since shape::type_t is no hashable on gcc 5
      
      * Formatting
      
      * Add dnnl binary op
      
      * Formatting
      
      * Add binary and eltwise
      
      * Formatting
      
      * Add softmax
      
      * Formatting
      
      * Remove unused operators
      
      * Add missing files
      
      * Formatting
      
      * Add lrn
      
      * Formatting
      
      * Add deconvolution
      
      * Formatting
      
      * Change allocate default
      
      * Add reorder
      
      * Formatting
      
      * Add reductions
      
      * Formatting
      
      * Sort lines
      
      * Change literals in another loop
      
      * Add pow operator
      
      * Formatting
      
      * Add pow operator
      
      * Formatting
      
      * Make sure shapes are packed
      
      * Allow broadcasted inputs
      
      * Remove unused operators
      
      * Simplify functions
      
      * Remove softmax
      
      * Add sub and erf functions
      
      * Formatting
      
      * Fix bug
      
      * Formatting
      
      * Improve parallism
      
      * Formatting
      
      * Allow multiple batch dimensions
      
      * Formatting
      
      * Move literal transforms out of lowering
      
      * Formatting
      
      * Add gather operator
      
      * Sort lines
      
      * Add early exit for carry
      
      * Formatting
      
      * Add missing concat
      
      * Rename macro
      
      * Fix deep nesting
      
      * Formatting
      
      * Fix cppcheck issues
      
      * Remov else
      
      * Move attribute to typedef
      
      * Formatting
      
      * Disable maybe-uninitialized warning since its broken on gcc
      
      * Add constexpr default constructor
      
      * Formatting
      
      * Fix compiler warnings
      
      * Fix adjust_allocation test
      
      * Add layernorm matcher
      
      * Add gelu_erf matcher
      
      * Formatting
      
      * Add gelu_tanh matcher
      
      * Formatting
      
      * Remove match namespace
      
      * Formatting
      
      * Use matcher instead of string
      
      * Formatting
      
      * Add fusions
      
      * Formatting
      
      * Add post op field
      
      * Formatting
      
      * Make post_ops serializable
      
      * Formatting
      
      * Add eltwise fusions
      
      * Formatting
      
      * Fix null conversions
      
      * Formatting
      
      * Add fuse_ops source files
      
      * Formatting
      
      * Set binary post op index correctly
      
      * Formatting
      
      * Fix serialization bugs
      
      * Check if used once
      
      * Formatting
      
      * Fix error in get_primitive_attr
      
      * Formatting
      
      * Add compile function
      
      * Formatting
      
      * Limit fusions
      
      * Formatting
      
      * Disable with env variable instead of using compile arg
      
      * Formatting
      
      * Fix implicit conversion to bool
      
      * Declar on seperate lines
      
      * Formatting
      
      * Fix cppcheck issues
      
      * Fix ICE in pack_join
      
      * Formatting
      
      * Use const ref
      
      * Make enum hashable
      
      * Formatting
      
      * Add explicit this
      
      * Fix merge issues
      
      * Fix dangling ref
      
      * Formatting
      
      * Add test for compile
      
      * Formatting
      
      * Add more value tests
      
      * Formatting
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      f7befe50
  13. 19 Apr, 2021 1 commit
    • Paul Fultz II's avatar
      Add code generation for pointwise operators (#780) · 35d1bcc2
      Paul Fultz II authored
      * Add definitions for all pointwise operators
      
      * Formatting
      
      * Add cpp generator class
      
      * Formatting
      
      * Move compilation to core
      
      * Formatting
      
      * Add clock to tmp name
      
      * Add dynamic loader
      
      * Formatting
      
      * Add tests for code gen
      
      * Formatting
      
      * Add test for literals
      
      * Formatting
      
      * Use with_char
      
      * Add missing header
      
      * Fix mismerge
      
      * Ignore tidy warning
      
      * Fxx gcc 5 errors
      
      * Apply fixits
      
      * Skip signed bitwise of status
      
      * Remove unused parameters
      
      * Explicitly add c++14 flag
      
      * Fix tidy warning
      
      * Remove .o files
      35d1bcc2
  14. 26 Mar, 2021 1 commit
    • Paul Fultz II's avatar
      Add initial code generation (#762) · 581d31b0
      Paul Fultz II authored
      
      
      * Add code object op
      
      * Formattting
      
      * Add more value tests
      
      * Formatting
      
      * Fix from_value conversion from binary
      
      * Formatting
      
      * Dont use offload copy
      
      * Remove iostream header
      
      * Fix compilation errors
      
      * Formatting
      
      * Rename var
      
      * Add missing files
      
      * Formatting
      
      * Remove duplicate variable
      
      * Remove comment
      
      * Template the function so sfinae will work
      
      * Formatting
      
      * Use template specialization since ADL is broken on hcc
      
      * Formatting
      
      * Annotate the constructor with HD for hcc
      
      * Make variable const
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      581d31b0
  15. 26 Feb, 2021 1 commit
    • Paul Fultz II's avatar
      Add more supported operators and optimizations for the cpu backend (#746) · a0b570b2
      Paul Fultz II authored
      
      
      * Add eliminate_data_type pass
      
      * Formatting
      
      * Auto convert quant ops
      
      * Formatting
      
      * Flip the order of decompose
      
      * Compute max size differently
      
      * Formatting
      
      * Clamp values in convert
      
      * Formatting
      
      * Fix loss of precision in reduce
      
      * Formatting
      
      * Fix bugs in reduction
      
      * Fix accumulator type in reference softmax implementation
      
      * Formatting
      
      * Update convert test
      
      * Remove unused variables
      
      * Remove unnecessary quant_dot check
      
      * Formatting
      
      * Add tests
      
      * Formatting
      
      * Remove unused code
      
      * Remove duplicate ops
      
      * Remove blaze dependency
      
      * Use set since shape::type_t is no hashable on gcc 5
      
      * Formatting
      
      * Add dnnl binary op
      
      * Formatting
      
      * Add binary and eltwise
      
      * Formatting
      
      * Add softmax
      
      * Formatting
      
      * Remove unused operators
      
      * Add missing files
      
      * Formatting
      
      * Add lrn
      
      * Formatting
      
      * Add deconvolution
      
      * Formatting
      
      * Change allocate default
      
      * Add reorder
      
      * Formatting
      
      * Add reductions
      
      * Formatting
      
      * Sort lines
      
      * Change literals in another loop
      
      * Add pow operator
      
      * Formatting
      
      * Add pow operator
      
      * Formatting
      
      * Make sure shapes are packed
      
      * Allow broadcasted inputs
      
      * Remove unused operators
      
      * Simplify functions
      
      * Remove softmax
      
      * Add sub and erf functions
      
      * Formatting
      
      * Fix bug
      
      * Formatting
      
      * Improve parallism
      
      * Formatting
      
      * Allow multiple batch dimensions
      
      * Formatting
      
      * Move literal transforms out of lowering
      
      * Formatting
      
      * Add gather operator
      
      * Sort lines
      
      * Add early exit for carry
      
      * Formatting
      
      * Add missing concat
      
      * Rename macro
      
      * Fix deep nesting
      
      * Formatting
      
      * Fix cppcheck issues
      
      * Remov else
      
      * Move attribute to typedef
      
      * Formatting
      
      * Disable maybe-uninitialized warning since its broken on gcc
      
      * Add constexpr default constructor
      
      * Formatting
      
      * Fix compiler warnings
      
      * Fix adjust_allocation test
      Co-authored-by: default avatarShucai Xiao <shucai@gmail.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      a0b570b2
  16. 25 Feb, 2021 1 commit
  17. 06 Jan, 2021 1 commit
    • Shucai Xiao's avatar
      Module impl (#678) · c9b86f1c
      Shucai Xiao authored
      
      
      * add an api get_main_module
      
      * clang format
      
      * modify onnx unit test for module
      
      * clang format
      
      * refactor ops unit test with the get_main_module
      
      * clang format
      
      * code backup
      
      * clang format
      
      * refine module c api
      
      * add python api for module
      
      * clang format
      
      * fix a python api issue
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * refine unit tests changes
      
      * clang format
      
      * code backup
      
      * code backup
      
      * clang format
      
      * defer some changes to later PRs
      
      * change return of get_main_module from ref to pointer
      
      * clang format
      
      * add unit tests for the get_main_module_api
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * add more unit tests for more code change coverage
      
      * clang format
      
      * fixed a unit test error
      
      * clang format
      
      * fix unit test
      
      * clang format
      
      * code backup
      
      * code change for more code coverage
      
      * change program to module in various passes and matcher
      
      * clang format
      
      * modify the pass API
      
      * code backup
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * Add option to no generate a destroy method
      
      * Formatting
      
      * fix some review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * clang format
      
      * code backup
      
      * code backup
      
      * clang format
      
      * fix cppcheck errors
      
      * clang format
      
      * clang format
      
      * fix build errors
      
      * clang format
      
      * modify gpu unit tests to using module
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * Add flag to enable cpu backend
      
      * Make buffers shared
      
      * Enable optimizations
      
      * Formatting
      
      * fix review comments
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * fix a bug related to a unit test
      
      * clang format
      
      * clang format
      
      * fix a build error
      
      * remove unnecessary code
      
      * remove unnecessary files
      
      * code backup
      
      * clang format
      
      * remove the compile function from the module class
      
      * clang format
      
      * clang format
      
      * remove the context parameter from the from_value method of the module class
      
      * code refinement
      
      * clang format
      
      * merge changes from develop branch
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * fix a build error
      
      * fixed a merge error
      
      * fix cppcheck error
      
      * fixed review comments
      
      * clang format
      
      * fix cppcheck error
      
      * fix a cppcheck error
      
      * fix cppcheck error
      
      * fix build error caused by merge
      
      * Add missing has_op function
      
      * Formatting
      
      * merge changes from develop branch
      
      * fix a cppcheck error
      
      * fixed some review comments
      
      * clang format
      
      * remove the begin/end function of the program class
      
      * clang format
      
      * refine code and fix cppcheck error
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * add unit tests for more code coverage
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix a build error in debug mode
      
      * clang format
      Co-authored-by: default avatarPaul <pfultz2@yahoo.com>
      c9b86f1c
  18. 14 Dec, 2020 1 commit
    • Paul Fultz II's avatar
      Use dnnl for cpu backend (#688) · 406afeb8
      Paul Fultz II authored
      
      
      * Add flag to enable cpu backend
      
      * Make buffers shared
      
      * Enable optimizations
      
      * Add onednn
      
      * Formatting
      
      * Formatting
      
      * Add dnnl header
      
      * Formatting
      
      * Rewrite rnn first
      
      * Formatting
      
      * Call reference implementation
      
      * Formatting
      
      * Make literal data shared
      
      * Formatting
      
      * Add convolution
      
      * Formatting
      
      * Compensate for dilation
      
      * Formatting
      
      * Use name/make_op instead
      
      * Formatting
      
      * Rename gemm header
      
      * Formatting
      
      * Add dnnl convolution/gemm operators
      
      * Formatting
      
      * Add eliminate_contiguous
      
      * Add faster pointwise operators
      
      * Formatting
      
      * Formatting
      
      * Formatting
      
      * Add dnnl op class
      
      * Formatting
      
      * Add add op
      
      * Formatting
      
      * Add concat operator
      
      * Formatting
      
      * Add more ops
      
      * Create descriptor during finalization
      
      * Formatting
      
      * Dont rewrite pooling
      
      * Enable memory coloring
      
      * Formatting
      
      * Add output aliases
      
      * Formatting
      
      * Fix errors
      
      * Formatting
      
      * Convert literals
      
      * Add missing file
      
      * Remove batch_norm
      
      * Formatting
      
      * Use strides
      
      * Formatting
      
      * Add some debug checks
      
      * Formatting
      
      * Fix big in adjusting shape for gemm
      
      * Formatting
      
      * Fix fallback dot operator
      
      * Zero initialize buffers
      
      * Add suport for group convolutions
      
      * Formatting
      
      * Make adjust allocation target independent
      
      * Formatting
      
      * Enable adjust_allocation for gpu/cpu
      
      * Formatting
      
      * Add copy to allocation model
      
      * Formatting
      
      * Add copy operator
      
      * Formatting
      
      * Better handling of output parameters in adjust_allocation
      
      * Formatting
      
      * Build with dnnl
      
      * Make dnnl required
      
      * Fix compile error
      
      * Tidy fixes
      
      * Formatting
      
      * Tidy fixes
      
      * Formatting
      
      * Fix more tidy issues
      
      * Formatting
      
      * Add mul op
      
      * Add mul op
      
      * Set c compiler to clang as well
      
      * Compensate for normalized compute shape
      
      * Formatting
      
      * Fix cppcheck errors
      
      * Formatting
      
      * Add onednn library to hcc
      
      * Guard clang pragmas
      
      * Disable cpu mode for gcc for now
      
      * Leave it enabled it for gcc 7
      
      * Fix cppcheck suppresion
      
      * Fix compile error on gcc 5
      
      * Remove unused code
      Co-authored-by: default avatarShucai Xiao <shucai.xiao@amd.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      406afeb8
  19. 11 Nov, 2020 1 commit
  20. 09 Nov, 2020 1 commit
    • Paul Fultz II's avatar
      Add hip compilation (#664) · f71af72a
      Paul Fultz II authored
      
      
      * Add compiler flags
      
      * Add missing include
      
      * Add filesystem header
      
      * Formatting
      
      * Add tmp_dir to run
      
      * Formatting
      
      * Kernel compilation and launching
      
      * Formatting
      
      * Seperate pack_args
      
      * Formatting
      
      * Add alignment tests
      
      * Formatting
      
      * Add compile test
      
      * Formatting
      
      * Complete compile test
      
      * Formatting
      
      * Use is_regular_file free function
      
      * Fix is_regular_file call
      
      * Fix tidy issues
      
      * Fix tidy
      
      * Fix tidy issue
      
      * Print size in read_buffer to debug issue on jenkins
      
      * Add hip flags before src file
      
      * Fix reading output files
      
      * Fix unsued variable warning
      
      * Formatting
      
      * Formatting
      
      * Disable tidy check
      Co-authored-by: default avatarShucai Xiao <shucai.xiao@amd.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      f71af72a
  21. 04 Nov, 2020 1 commit
    • Paul Fultz II's avatar
      Split cpu and reference implementation (#671) · 500d9441
      Paul Fultz II authored
      
      
      * Add all_targets cmake target
      
      * Rename target
      
      * Add ref target
      
      * Rename tests
      
      * Refactor compiler target
      
      * Formatting
      
      * Verify for every target
      
      * Formatting
      
      * Add verify test suite
      
      * Formatting
      
      * Add initial test programs
      
      * Formatting
      
      * Add rnn tests
      
      * Formatting
      
      * Validate gpu
      
      * Formatting
      
      * Remove old gpu tests
      
      * Fix gpu tests
      
      * Fix ref error
      
      * Fix tidy issues
      
      * Formatting
      
      * Tidy fixes
      
      * Fix header in python api
      
      * Rename to ref
      
      * Use ref in verify_onnx
      
      * Fix tidy issue
      
      * Build with verbose on
      
      * Fix typo
      
      * Remove verbose
      
      * rename some cpu prefix to ref
      Co-authored-by: default avatarShucai Xiao <Shucai.Xiao@amd.com>
      500d9441
  22. 15 Oct, 2020 1 commit
    • turneram's avatar
      Added greater and less operators (#660) · 48ffbfa5
      turneram authored
      
      
      * Added greater and less operators
      
      * Fixed ops_test.cpp
      
      * Set commutative to false for less, greater
      
      * Refactored parse_equal/less/greater into parse_compare_op
      
      * Removed unnecessary function attributes() from greater.hpp/less.hpp
      
      * Added op_name arguments
      
      * Removed local settings
      
      * Formatting
      
      * Missing comma
      
      * Formatting
      
      * Formatting
      
      * Formatting
      
      * Formatting
      
      * Formatting
      
      * Missing space
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      48ffbfa5
  23. 09 Oct, 2020 1 commit
    • Paul Fultz II's avatar
      Add parallel stream analysis (#629) · 1d98fbb4
      Paul Fultz II authored
      * Add intial multi stream analysis
      
      * Formatting
      
      * Add more tests
      
      * Formatting
      
      * Remove comment
      
      * Analyze streams on the gpu
      
      * Formatting
      
      * Fix nstream
      
      * Formatting
      
      * Add test for return
      
      * Formatting
      
      * Make sure return has a stream assignment
      
      * Formatting
      
      * Fix asserts and checks
      
      * Improve error message for out-of-order sequence
      
      * Formatting
      1d98fbb4
  24. 08 Oct, 2020 1 commit
    • kahmed10's avatar
      Add build flag for fast math (#639) · a5065265
      kahmed10 authored
      
      
      * add flag
      
      * formatting
      
      * remove env variable
      
      * fix api expression
      
      * add api test
      
      * add api test
      
      * add op test
      
      * formatting
      
      * fix function name
      
      * fix syntax
      
      * formatting
      
      * modify test
      
      * remove test and update doc
      
      * move test to new file
      
      * formatting
      
      * revert test files
      
      * rewrite check
      
      * New
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      a5065265
  25. 30 Sep, 2020 1 commit
    • Paul Fultz II's avatar
      Add hip clang builds to jenkins (#651) · f28a62ea
      Paul Fultz II authored
      * Make global variables const
      
      * Tidy fixes
      
      * Disable some lints
      
      * Formatting
      
      * Fix tidy const
      
      * Formatting
      
      * Add missing const keywords
      
      * Formatting
      
      * More fixes
      
      * Fix remaining tidy issues
      
      * Formatting
      
      * Fix rocblas function call
      
      * Formatting
      
      * Fix nodiscard warnings
      
      * Formatting
      
      * Use named parameters
      
      * Remove overload
      
      * Add overload
      
      * Remove noncps
      
      * Use named param for node
      
      * Add auto register header
      
      * Use named parameters
      
      * Refactor jenkinsfile
      
      * Fix shadow
      
      * Add missing body variable
      
      * Add more const methods
      
      * Add hip-clang docker builds
      
      * Remove comments
      
      * Add clang-format
      
      * Add more const
      
      * Formatting
      
      * Rename stage
      
      * Disable check
      
      * Add another const
      
      * Add python 2 dev packages
      
      * Add sphinx to dockerfile
      f28a62ea
  26. 31 Aug, 2020 1 commit
    • Shucai Xiao's avatar
      Pooling ceil mode (#615) · 9dabe26b
      Shucai Xiao authored
      
      
      * support pooling ceil_mode
      
      * clang format
      
      * add unit test for pooling ceil mode
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * add more unit tests and fixed a bug in cpu pooling implementation
      
      * clang format
      
      * add one more unit test
      
      * clang format
      
      * fix cppcheck error
      
      * fix cppcheck error
      
      * fix cppcheck error
      
      * fix review comments
      
      * clang format
      
      * remove the padding_mode attribute in pooling
      
      * clang format
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix a cppcheck error
      
      * fix review comments
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      9dabe26b
  27. 27 Aug, 2020 2 commits
    • Shucai Xiao's avatar
      Context serialization (#607) · 6e1f9f20
      Shucai Xiao authored
      
      
      * Add initial serialization
      
      * Formatting
      
      * Add unit tests
      
      * Formatting
      
      * Add tests for serialization
      
      * Formatting
      
      * Use or not and
      
      * Add value test
      
      * Formatting
      
      * Add more tests
      
      * Add shape serialization
      
      * Formatting
      
      * Add serializtion for literal and argument
      
      * Formatting
      
      * Add from and to value to operatation
      
      * Formatting
      
      * Serialize empty types
      
      * Formatting
      
      * Tidy fixes
      
      * Formatting
      
      * Fix tidy issues
      
      * Formatting
      
      * Reformat value type macro
      
      * Formatting
      
      * Handle enum types
      
      * Formatting
      
      * Use const ref
      
      * Update
      
      * Add tests for to_value/from_value
      
      * Formatting
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * remove the from/to_value method for the generate context struct
      
      * clang format
      
      * code backup
      
      * Dont print literal data in hip_copy_literal
      
      * clang format
      
      * add unit test to have better coverage
      
      * remove unnecessary code
      
      * remove unnecessary code
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      Co-authored-by: default avatarPaul <pfultz2@yahoo.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      6e1f9f20
    • Shucai Xiao's avatar
      Bool type and equal operator (#603) · 59b80d4e
      Shucai Xiao authored
      
      
      * add bool type
      
      * code backup
      
      * code backup
      
      * clang format
      
      * fix build warnings
      
      * clang format
      
      * add the equal operator
      
      * add the equal operator
      
      * clang format
      
      * remove unnecessary code
      
      * refine unit tests
      
      * clang format
      
      * fix review comments and a bug
      
      * clang format
      
      * additional changes
      
      * clang format
      
      * fix cppcheck error
      
      * add bool type in c api
      
      * fix cppcheck error
      
      * fix review comments
      
      * fix cppcheck error
      
      * fix a build error related to gcc
      
      * fix cppcheck error
      
      * fix cppcheck error
      
      * added the equal operator to register list
      
      * add parsing boolean type
      
      * clang format
      
      * fix bool type issue for python output
      
      * clang format
      
      * add support for automatic multibroadcast of the equal operator
      
      * additional unit tests for more code coverage
      
      * clang format
      
      * missing an onnx file
      Co-authored-by: default avatarPaul Fultz II <pfultz2@yahoo.com>
      59b80d4e
  28. 25 Aug, 2020 1 commit
    • Paul Fultz II's avatar
      Improve layernorm performance (#613) · 56b3bf58
      Paul Fultz II authored
      * Use increment instead of division to compute register offset
      
      * Formatting
      
      * Limit layernorm to 1024 elements
      
      * Formatting
      
      * Add verification to driver
      
      * Formatting
      
      * Remove early return
      
      * Use block_size 256
      
      * Vectorize the kernel
      
      * Formatting
      
      * Convert to vector type
      
      * Add layernorm tests
      
      * Formatting
      
      * Formatting
      
      * Refactor layernorm to run both algos
      
      * Formatting
      
      * Fix compile error
      
      * Fix tidy warnings
      
      * Formatting
      
      * Add layernorm function
      
      * Formatting
      56b3bf58
  29. 14 Aug, 2020 1 commit
    • kahmed10's avatar
      Layernorm onnx support (#599) · 2c5d5fee
      kahmed10 authored
      
      
      * fix pad calc
      
      * bert tf passes correctness
      
      * formatting
      
      * add test
      
      * formatting
      
      * remove comment
      
      * add inline
      
      * formatting
      
      * fix order for literal
      
      * formatting
      
      * test no mul_add
      
      * formatting
      
      * debug layernorm
      
      * debug layernorm
      
      * manual merge
      
      * more progress
      
      * formatting
      
      * remove miopen batchnorm
      
      * remove headers
      
      * Fix compile error with no dpp reductions
      
      * fix indices
      
      * formatting
      
      * change matcher
      
      * formatting
      
      * remove binds
      
      * formatting
      
      * disable tf matcher
      
      * formatting
      
      * use fast div
      
      * formatting
      
      * fix matcher
      
      * formatting
      
      * remove comment
      
      * move find_matches
      
      * add assert
      
      * formatting
      
      * fix deepcode issue
      Co-authored-by: default avatarPaul <pfultz2@yahoo.com>
      Co-authored-by: default avatarShucai Xiao <shucai.xiao@amd.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      2c5d5fee
  30. 21 Jul, 2020 1 commit
  31. 16 Jul, 2020 1 commit
    • kahmed10's avatar
      Nd deconv cpu (#565) · 98ade977
      kahmed10 authored
      
      
      * initial progress
      
      * formatting
      
      * check existing tests
      
      * formatting
      
      * change for loop to transform
      
      * formatting
      
      * add tests
      
      * formatting
      
      * remove comment
      
      * add more tests
      
      * update gpu miopen calls
      
      * formatting
      
      * initial progress
      
      * add cpu impl and tests
      
      * formatting
      
      * add NOLINT
      
      * add 3d test
      
      * formatting
      
      * add more op_shape tests
      
      * fix error msg
      
      * fix bounds
      
      * formatting
      
      * fix algorithm
      
      * formatting
      
      * pin numpy version
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      98ade977
  32. 10 Jul, 2020 1 commit
    • Shucai Xiao's avatar
      Gpu batchnorm (#564) · 70ba8213
      Shucai Xiao authored
      
      
      * Initial cpu conv-nd
      
      * Formatting
      
      * Make index signed
      
      * Formatting
      
      * Assert the indices are greater than 0
      
      * Use equal instead of lexicographical_compare
      
      * Formatting
      
      * change the batchnorm cpu implementation to support multiple input dimensions
      
      * clang format
      
      * add unit tests for cpu batch_norm nd implementation
      
      * clang format
      
      * support nd batchnormalization
      
      * clang format
      
      * add rewrite batch_norm unit tests
      
      * clang format
      
      * remove a unit test
      
      * Fix tidy errors
      
      * Formatting
      
      * Handle different types
      
      * Formatting
      
      * Fix nested visits
      
      * Formatting
      
      * Add 3d conv test
      
      * Formatting
      
      * revert unnecessary changes
      
      * remove a print line
      
      * Fix ICE
      
      * Formatting
      
      * fix the per_activation mode of 2d
      
      * clang format
      
      * code clean up
      
      * clang format
      
      * add 1d and 3d gpu unit test
      
      * clang format
      
      * add unit test for rewrite_batchnorm
      
      * clang format
      
      * additional refinement
      
      * fix review comments
      
      * added a unit test to have more code coverage
      Co-authored-by: default avatarPaul <pfultz2@yahoo.com>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      70ba8213
  33. 09 Jul, 2020 1 commit
  34. 08 Jul, 2020 1 commit
    • kahmed10's avatar
      Nd pooling gpu (#551) · d1258e80
      kahmed10 authored
      
      
      * initial progress
      
      * formatting
      
      * add pooling changes
      
      * formatting
      
      * change eliminate_pad
      
      * formatting
      
      * rename var
      
      * fomratting
      
      * update op shape test and compute
      
      * formatting
      
      * revert conv constructor
      
      * formatting
      
      * change initializer
      
      * formatting
      
      * fix tidy
      
      * change quant conv and shape check
      
      * add tests and fixes
      
      * formatting
      
      * fix type
      
      * fix conv test
      
      * formatting
      
      * add pooling and bn tests
      
      * formatting
      
      * add inconsistent attr tests
      
      * fix padding issue
      
      * formatting
      
      * progress on 1d to 2d
      
      * formatting
      
      * change compute and compile functions
      
      * formatting
      
      * fix duplicate
      
      * fix conflict
      
      * fix issue with 1d conv
      
      * formatting
      
      * add check for 3d limit
      
      * rename function
      
      * formatting
      
      * update to MIOPen 2.3
      
      * add support for nd pooling
      
      * formatting
      
      * test miopen 2.4
      
      * change function name
      
      * rename functions
      
      * formatting
      
      * add op_shape test
      
      * add gpu ops tests
      
      * formatting
      
      * add pkg-config
      
      * change functions
      
      * formatting
      
      * change to copy_backward
      
      * formatting
      
      * test diff miopen version
      
      * add pooling shape tests
      
      * temp disable test
      
      * revert to miopen 2.4
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      d1258e80
  35. 23 Jun, 2020 1 commit
    • Shucai Xiao's avatar
      Neg operator (#557) · 866cca5b
      Shucai Xiao authored
      * add the neg operator
      
      * clang format
      
      * add missing operator
      
      * fixed a cppcheck error
      
      * change to use the neg operator
      
      * clang format
      866cca5b
  36. 28 May, 2020 1 commit
    • Shucai Xiao's avatar
      Separate gpu unittests to multiple files (#541) · 218e20fc
      Shucai Xiao authored
      
      
      * code backup
      
      * clang format
      
      * fix compiling errors
      
      * clang format
      
      * rename a few files
      
      * rename a few files
      
      * fix variable bugs
      
      * clang format
      
      * add an operator to shift input sequences
      
      * clang format
      
      * fixed a bug
      
      * clang format
      
      * fixed a bug
      
      * clang format
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * refine code related lstm operator optimization
      
      * clang format
      
      * fix various bugs
      
      * clang format
      
      * fixed a bug in rewrite_lstm
      
      * clang format
      
      * fixed another bug
      
      * refine two operator names
      
      * clang format
      
      * refine file names
      
      * fix cppcheck error
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * fix cppcheck error
      
      * fixed review comments
      
      * clang format
      
      * add unit tests
      
      * clang format
      
      * add unit tests
      
      * clang format
      
      * refine unit tests for better coverage
      
      * clang format
      
      * fixed a bug
      
      * fix cppcheck error
      
      * fix review comments
      
      * clang format
      
      * rename two operators according to review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      
      * fix a cppcheck error
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * add an operator to simplify code
      
      * clang format
      
      * clang format
      
      * fixed a bug and add unit tests
      
      * clang format
      
      * add more unit tests
      
      * clang format
      
      * add more unit tests
      
      * clang format
      
      * add more unit tests
      
      * clang format
      
      * refine a unit test
      
      * clang format
      
      * refine a unit test
      
      * add more unit tests and refine some existing tests for the rnn operator improvements
      
      * clang format
      
      * additional changes to simplify code further
      
      * clang format
      
      * refine a test case to refine cppcheck error
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * separate rnn tests out to reduce file size
      
      * clang format
      
      * code cleanup
      
      * refine unit tests
      
      * fix clang tidy error
      
      * clang format
      Co-authored-by: default avatarShucai Xiao <scxiao@prj47-rack-99.local.lan>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      218e20fc
  37. 26 May, 2020 1 commit
    • Shucai Xiao's avatar
      Add support variable seq lens for the RNN and GRU operators (#535) · d7b8164c
      Shucai Xiao authored
      
      
      * code backup
      
      * clang format
      
      * fix compiling errors
      
      * clang format
      
      * rename a few files
      
      * rename a few files
      
      * fix variable bugs
      
      * clang format
      
      * add an operator to shift input sequences
      
      * clang format
      
      * fixed a bug
      
      * clang format
      
      * fixed a bug
      
      * clang format
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * refine code related lstm operator optimization
      
      * clang format
      
      * fix various bugs
      
      * clang format
      
      * fixed a bug in rewrite_lstm
      
      * clang format
      
      * fixed another bug
      
      * refine two operator names
      
      * clang format
      
      * refine file names
      
      * fix cppcheck error
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * fix cppcheck error
      
      * fixed review comments
      
      * clang format
      
      * add unit tests
      
      * clang format
      
      * add unit tests
      
      * clang format
      
      * refine unit tests for better coverage
      
      * clang format
      
      * fixed a bug
      
      * fix cppcheck error
      
      * fix review comments
      
      * clang format
      
      * rename two operators according to review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      
      * fix a cppcheck error
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * add an operator to simplify code
      
      * clang format
      
      * clang format
      
      * fixed a bug and add unit tests
      
      * clang format
      
      * add more unit tests
      
      * clang format
      
      * add more unit tests
      
      * clang format
      
      * add more unit tests
      
      * clang format
      
      * refine a unit test
      
      * clang format
      
      * refine a unit test
      
      * add more unit tests and refine some existing tests for the rnn operator improvements
      
      * clang format
      
      * additional changes to simplify code further
      
      * clang format
      
      * refine a test case to refine cppcheck error
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * add more unit tests
      
      * clang format
      Co-authored-by: default avatarShucai Xiao <scxiao@prj47-rack-99.local.lan>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      d7b8164c
  38. 20 May, 2020 1 commit
    • Shucai Xiao's avatar
      Rnn variable seq lengths (#517) · 90200619
      Shucai Xiao authored
      
      
      * code backup
      
      * clang format
      
      * fix compiling errors
      
      * clang format
      
      * rename a few files
      
      * rename a few files
      
      * fix variable bugs
      
      * clang format
      
      * add an operator to shift input sequences
      
      * clang format
      
      * fixed a bug
      
      * clang format
      
      * fixed a bug
      
      * clang format
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * code backup
      
      * clang format
      
      * refine code related lstm operator optimization
      
      * clang format
      
      * fix various bugs
      
      * clang format
      
      * fixed a bug in rewrite_lstm
      
      * clang format
      
      * fixed another bug
      
      * refine two operator names
      
      * clang format
      
      * refine file names
      
      * fix cppcheck error
      
      * clang format
      
      * fix cppcheck error
      
      * clang format
      
      * fix cppcheck error
      
      * fixed review comments
      
      * clang format
      
      * add unit tests
      
      * clang format
      
      * add unit tests
      
      * clang format
      
      * refine unit tests for better coverage
      
      * clang format
      
      * fixed a bug
      
      * fix cppcheck error
      
      * fix review comments
      
      * clang format
      
      * rename two operators according to review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      
      * clang format
      
      * fix review comments
      
      * fix a cppcheck error
      
      * clang format
      
      * fix review comments
      
      * clang format
      Co-authored-by: default avatarShucai Xiao <scxiao@prj47-rack-99.local.lan>
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      90200619
  39. 15 May, 2020 1 commit
    • kahmed10's avatar
      Add gelu optimization (#521) · 0079028a
      kahmed10 authored
      
      
      * fix pad calc
      
      * bert tf passes correctness
      
      * formatting
      
      * add test
      
      * formatting
      
      * remove comment
      
      * add inline
      
      * formatting
      
      * fix order for literal
      
      * formatting
      
      * add test for gelu
      
      * formatting
      
      * added add_gelu fusion
      
      * add files
      
      * formatting
      
      * remove layernorm opt
      
      * revert reduce file
      
      * add gelu_fn and tests
      
      * formatting
      
      * fix matcher, remove extra tests
      
      * formatting
      
      * fix matcher
      
      * add used_once
      
      * formatting
      
      * start on new gelu
      
      * formatting
      
      * add matchers in fuse_ops
      
      * formatting
      
      * add dce to fix add_gelu
      
      * add simplify_rsqrt and test
      
      * formatting
      
      * debugging value for matcher
      
      * formatting
      
      * add more to matchers
      
      * formatting
      
      * fix errors
      
      * remove onnx gen
      
      * add any_arg, change matchers to use either_arg
      
      * formatting
      
      * formatting
      
      * add used_once
      
      * formatting
      Co-authored-by: default avatarmvermeulen <5479696+mvermeulen@users.noreply.github.com>
      0079028a