"magic_pdf/vscode:/vscode.git/clone" did not exist on "7a61afb952ba15f68ac6863284231a42eb71643f"
  • Qianfeng's avatar
    Reduction for int8 and bfloat16 (#125) · 9a8ee8a3
    Qianfeng authored
    
    
    * Use thread cluster descriptor and explicit M_K 2d descriptor to simply Blockwise Reduction
    
    * Change by replacing ReduceDims by NumReduceDims as Device Reduce interface template parameter
    
    * Rename the folder name for the pool2d and reduce examples
    
    * Update to reduction test scripts
    
    * Add Readme for pool2d_fwd and reduce_blockwise examples
    
    * Add support for int8_t reduction (ADD/AVG, MIN/MAX/AMAX)
    
    * Tiny fix in reduce profiler and tiny update in reduce testing scripts
    
    * Tiny fix in testing script profile_reduce_no_index.sh
    
    * Tiny fix in testing script profile_reduce_no_index.sh
    
    * Add support for bfp16 reduction (using bhalf_t = ushort)
    
    * Tiny fix in amd_buffer_addressing.hpp
    
    * Tiny change in script/profile_reduce_with_index.sh
    
    * Use AccDataType for Beta value and use element_wise::PassThrough
    
    * Use type_convert for type converting in host layer reduction
    
    * Renaming and refining in Reduction profiler/device layer/examples
    
    * Renaming and refining in Reduction profiler/device layer/examples
    
    * Renaming all NumReduceDims to NumReduceDim
    
    * Fix the leaked type_convert in ThreadwiseTensorSliceTransfer_v2
    
    * Update to testing scripts to add bf16 support
    
    * added more static_assert
    
    * Remove buggy tunable configurations defined in device_reduce_instance_xxx.hpp
    
    * Add static_assert to give compile-time warning for incorrect thread slice-size/vector-size configurations
    
    * minor change
    
    * Refine and fix (in GetWorkspaceSizeInBytes of MultiBlockPartialReduce) to make int8 completely pass
    
    * Tiny renaming in gridwise_2d_reduction_multiblock_partial_reduce.hpp
    
    * Tiny fix in script/profile_reduce_no_index.sh
    
    * Refine in DeviceReduce layer with regard to using NumInvariantDim/NumReduceDim or InvariantDims/ReduceDims
    
    * Generic renaming in host reduction and DeviceReduce layer
    
    * Add support for 4-d all dimension reduction in the profiler and add_device_reduce_xxx instances
    
    * Use multi-thread and simplification for host Reduction implementation
    
    * Add ctest for reduction
    
    * Update to clarify the using of data init method in produce_reduce/example_reduce/test_reduce/
    
    * Update to the reduce CTest executables to enable default testing behavior when no command argument
    
    * Renaming
    Co-authored-by: default avatarJianfeng yan <jfyan008@gmail.com>
    9a8ee8a3
README.md 1.94 KB