• Qianfeng's avatar
    Update to the Reduction API and instances (#476) · dda3a0a1
    Qianfeng authored
    * Simplify the macros for declaring and defining the add_device_reduce_instance_xxxx() instances
    
    * Change the types of lengths and strides from std::vector to std::array for the reduction device interfaces
    
    * Remove DeviceSoftmaxImpl's depending on DeviceReduceMultiblock
    
    * Split the cpp and hpp files for reduction instances to enable more parallel compiling
    
    * Remove the using of macros for declaring reduction instances and instance references
    
    * Update to add_device_reduce_instance_xxxx templated functions
    
    * Use ReduceOperation+InElementwiseOp+AccElementwiseOp to repace the ReduceOpId in defining add_reduce_instance_xxxx() templates
    
    * Change return format
    dda3a0a1
profile_reduce_impl.hpp 22 KB