"git@developer.sourcefind.cn:OpenDAS/megatron-lm.git" did not exist on "6e9d5cb0512951d7ef01788f55b84dd20fb70963"
  1. 22 Mar, 2022 1 commit
    • Qianfeng's avatar
      Reduction for int8 and bfloat16 (#125) · 9a8ee8a3
      Qianfeng authored
      
      
      * Use thread cluster descriptor and explicit M_K 2d descriptor to simply Blockwise Reduction
      
      * Change by replacing ReduceDims by NumReduceDims as Device Reduce interface template parameter
      
      * Rename the folder name for the pool2d and reduce examples
      
      * Update to reduction test scripts
      
      * Add Readme for pool2d_fwd and reduce_blockwise examples
      
      * Add support for int8_t reduction (ADD/AVG, MIN/MAX/AMAX)
      
      * Tiny fix in reduce profiler and tiny update in reduce testing scripts
      
      * Tiny fix in testing script profile_reduce_no_index.sh
      
      * Tiny fix in testing script profile_reduce_no_index.sh
      
      * Add support for bfp16 reduction (using bhalf_t = ushort)
      
      * Tiny fix in amd_buffer_addressing.hpp
      
      * Tiny change in script/profile_reduce_with_index.sh
      
      * Use AccDataType for Beta value and use element_wise::PassThrough
      
      * Use type_convert for type converting in host layer reduction
      
      * Renaming and refining in Reduction profiler/device layer/examples
      
      * Renaming and refining in Reduction profiler/device layer/examples
      
      * Renaming all NumReduceDims to NumReduceDim
      
      * Fix the leaked type_convert in ThreadwiseTensorSliceTransfer_v2
      
      * Update to testing scripts to add bf16 support
      
      * added more static_assert
      
      * Remove buggy tunable configurations defined in device_reduce_instance_xxx.hpp
      
      * Add static_assert to give compile-time warning for incorrect thread slice-size/vector-size configurations
      
      * minor change
      
      * Refine and fix (in GetWorkspaceSizeInBytes of MultiBlockPartialReduce) to make int8 completely pass
      
      * Tiny renaming in gridwise_2d_reduction_multiblock_partial_reduce.hpp
      
      * Tiny fix in script/profile_reduce_no_index.sh
      
      * Refine in DeviceReduce layer with regard to using NumInvariantDim/NumReduceDim or InvariantDims/ReduceDims
      
      * Generic renaming in host reduction and DeviceReduce layer
      
      * Add support for 4-d all dimension reduction in the profiler and add_device_reduce_xxx instances
      
      * Use multi-thread and simplification for host Reduction implementation
      
      * Add ctest for reduction
      
      * Update to clarify the using of data init method in produce_reduce/example_reduce/test_reduce/
      
      * Update to the reduce CTest executables to enable default testing behavior when no command argument
      
      * Renaming
      Co-authored-by: default avatarJianfeng yan <jfyan008@gmail.com>
      9a8ee8a3
  2. 21 Mar, 2022 2 commits
  3. 11 Mar, 2022 1 commit
  4. 10 Mar, 2022 1 commit
    • Qianfeng's avatar
      Pr82 followup (#115) · 827301d9
      Qianfeng authored
      * Use thread cluster descriptor and explicit M_K 2d descriptor to simply Blockwise Reduction
      
      * Change by replacing ReduceDims by NumReduceDims as Device Reduce interface template parameter
      
      * Rename the folder name for the pool2d and reduce examples
      
      * Update to reduction test scripts
      
      * Add Readme for pool2d_fwd and reduce_blockwise examples
      
      * Tiny fix in reduce profiler and tiny update in reduce testing scripts
      
      * Tiny fix in testing script profile_reduce_no_index.sh
      
      * Tiny change in script/profile_reduce_with_index.sh
      
      * Renaming and refining in Reduction profiler/device layer/examples
      
      * Renaming and refining in Reduction profiler/device layer/examples
      
      * Renaming all NumReduceDims to NumReduceDim
      827301d9
  5. 09 Mar, 2022 1 commit
    • Chao Liu's avatar
      Reorganize files, Part 1 (#119) · 5d37d7bf
      Chao Liu authored
      * delete obselete files
      
      * move files
      
      * build
      
      * update cmake
      
      * update cmake
      
      * fix build
      
      * reorg examples
      
      * update cmake for example and test
      5d37d7bf
  6. 13 Jun, 2019 1 commit
  7. 12 Jun, 2019 3 commits