1. 31 May, 2023 1 commit
  2. 18 Jan, 2023 1 commit
  3. 11 Nov, 2022 1 commit
    • Po Yen Chen's avatar
      Rangify constructor of HostTensorDescriptor & Tensor<> (#445) · 4a2a56c2
      Po Yen Chen authored
      * Rangify STL algorithms
      
      This commit adapts rangified std::copy(), std::fill() & std::transform()
      
      * Rangify check_err()
      
      By rangifying check_err(), we can not only compare values between
      std::vector<>s, but also compare any ranges which have same value
      type.
      
      * Allow constructing Tensor<> like a HostTensorDescriptor
      
      * Simplify Tensor<> object construction logics
      
      * Remove more unnecessary 'HostTensorDescriptor' objects
      
      * Re-format example code
      
      * Re-write more HostTensorDescriptor ctor call
      4a2a56c2
  4. 13 Oct, 2022 1 commit
    • Adam Osewski's avatar
      Refactor device op implementations into `impl` subdirectory. (#420) · 30480288
      Adam Osewski authored
      
      
      * Move kernel implementation files under impl directory.
      
      * Update examples paths.
      
      * Update device kernel impl include paths.
      
      * Update tensor operation instances include paths.
      
      * Update profiler and tests include paths.
      
      * Clang-format
      
      * Update include paths for batched gemm reduce
      
      * Refactor UnitTest ConvNDBwdWeight.
      
      * Refactor fwd and bwd data convND UT.
      
      * Fix used test macro.
      
      * Fix include path.
      
      * Fix include paths.
      
      * Fix include paths in profiler and tests.
      
      * Fix include paths.
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      30480288
  5. 15 Aug, 2022 1 commit
    • Qianfeng's avatar
      Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320) · 53ea4713
      Qianfeng authored
      * Implement multiple-reduction in one kernel (kernels, device ops, examples)
      
      * Add generic elementwise kernel and device interface
      
      * Add generator for normal-distributed data initialization
      
      * Add host refer implementation of batchnorm-forward and batchnorm-infer
      
      * Add examples for implementing batchnorm-forward and batchnorm-infer using generic kernels
      
      * Remove un-needed including in batchnorm example
      
      * Renaming generic_elementwise to elementiwise in kernel and device classes/functions
      
      * Change in gemm_layernorm examples to use DeviceElementwise instead of Device5AryElementwise
      
      * Change in exampe 19_binary_elementwise to use DeviceElementwise instead of DeviceBinaryElementwise
      
      * Change in device_cgemm_4gemm_xdl_cshuffle.hpp to use kernel_elementwise instead of kernel_binary_elementwise
      
      * Add DeviceElementwiseBase and use it in device_normalize_instance.cpp
      
      * Removing and renaming files
      
      * Update to synchronize gemm_layernorm client example to the generic element-wise device op API
      
      * Update to synchronize with the latest headers directory and HostTensorDescriptor interface renaming
      
      * Merge two static member functions in device_elementwise.hpp
      
      * Remove unary_elementwise_1d kernel and device
      53ea4713