"vscode:/vscode.git/clone" did not exist on "97c2068bd9b21ac2b30177db6531554f4695bc51"
  1. 16 Mar, 2022 3 commits
  2. 14 Mar, 2022 10 commits
  3. 12 Mar, 2022 3 commits
  4. 11 Mar, 2022 4 commits
  5. 10 Mar, 2022 3 commits
    • Qianfeng's avatar
      Pr82 followup (#115) · 827301d9
      Qianfeng authored
      * Use thread cluster descriptor and explicit M_K 2d descriptor to simply Blockwise Reduction
      
      * Change by replacing ReduceDims by NumReduceDims as Device Reduce interface template parameter
      
      * Rename the folder name for the pool2d and reduce examples
      
      * Update to reduction test scripts
      
      * Add Readme for pool2d_fwd and reduce_blockwise examples
      
      * Tiny fix in reduce profiler and tiny update in reduce testing scripts
      
      * Tiny fix in testing script profile_reduce_no_index.sh
      
      * Tiny change in script/profile_reduce_with_index.sh
      
      * Renaming and refining in Reduction profiler/device layer/examples
      
      * Renaming and refining in Reduction profiler/device layer/examples
      
      * Renaming all NumReduceDims to NumReduceDim
      827301d9
    • Jing Zhang's avatar
      test cast static_arr to pointer · f9b740b5
      Jing Zhang authored
      f9b740b5
    • Jing Zhang's avatar
      merge develop · 9dce6851
      Jing Zhang authored
      9dce6851
  6. 09 Mar, 2022 1 commit
    • Chao Liu's avatar
      Reorganize files, Part 1 (#119) · 5d37d7bf
      Chao Liu authored
      * delete obselete files
      
      * move files
      
      * build
      
      * update cmake
      
      * update cmake
      
      * fix build
      
      * reorg examples
      
      * update cmake for example and test
      5d37d7bf
  7. 07 Mar, 2022 1 commit
  8. 06 Mar, 2022 1 commit
  9. 05 Mar, 2022 9 commits
    • Qianfeng's avatar
      Reduction in Composable Kernel (#82) · e17c0d80
      Qianfeng authored
      
      
      * Initial adding of generic reduction
      
      * Initial adding of generic reduction ...
      
      * Updates to make compiling done
      
      * clang-format all files
      
      * clang-format some files again
      
      * Renaming in profiler/include/profile_reduce.hpp
      
      * Updates and make BlockWise cases passed
      
      * Updates and make ThreadWise and MultiBlockTwoCall cases passed
      
      * Remove the support for MUL and NORM1 reduceOp from the profiler and the device instances
      
      * Change to replace the dim0_max_vector_size/dim1_max_vector_size template argument in the device reduce classes
      
      * format
      
      * adding pooling
      
      * added max and average pooling
      
      * comment out cout and kernel timing
      
      * Tiny simplification in profiler/reduce_profiler.cpp
      
      * Add example for reduce_blockwise
      
      * Tiny updates
      
      * Change to pass the ElementWiseOp from device layer to kernel
      
      * Fix the vectorDim and vectorSize in Device layer
      
      * Enable vector load on both dim0 and dim1 for Threadwise method
      
      * Tiny updates
      
      * Change to let the user to pass the preUnaryOp and posUnaryOp
      
      * Make pooling example work
      
      * split device_reduce_instance into two libraries
      
      * Tiny update
      
      * Replace nanPropaOpt enum by boolean propagate_nan
      
      * Simplification in DeviceReduce layer codes
      
      * update build
      
      * Change to clarify the difference between ck::half_t and half_float::half
      
      * Renaming in all the reduction codes
      
      * Add VectorSize as template parameter for device layer
      
      * Add BetaIsZero as kernel template and as AccDataType for alpha
      
      * print
      
      * Small updates for pooling
      
      * Updates for host_generic_reduction for reference
      
      * Update to make AVG pooling pass
      
      * Update to make MAX pooling with indices output pass
      
      * fix
      
      * add OutDst vector store to threadwise reduction and pooling
      
      * tweak
      
      * turn off check_indices that caused build issue
      
      * refactor pooling
      
      * clean up
      
      * turn off check_indices for building issue for php-compiler
      
      * add more tile size for odd C
      
      * tweak conv for odd C
      
      * update script
      
      * clean up elementwise op
      
      * add hack in reduction_operator.hpp to avoid compile error. To fix it, need to use element_wise_op in reduction op
      
      * Add OutVectorSize as device and kernel tunable, also update to Elementwise Operations
      
      * Move reduce operator mapping to host layer file reduction_operator_mapping.hpp from reduction_operator.hpp
      
      * Change to the unary operators
      
      * Move the definitions of unary operations to element_wise_operation.hpp
      
      * re-org files
      
      * Refine in device interfaces and multiblock kernels
      
      * Split the reduction configurations into instances for specific methods
      
      * Update in getTypeString() of device pool2d
      
      * Renaming in host and kernel
      
      * Tiny update in profiler/src/profiler.cpp
      
      * Uncomment in device_operation/CMakeLists.txt to enable the building of all operations
      
      * Make check_indices a templated function to remove some linking issue
      
      * Renaming in the profiler reduce module
      
      * Add support for double Reduction (but disable MultiblockAtomicAdd for double)
      
      * Tiny correction of literal string
      
      * Rename DevicePoolFwd to DevicePool2dFwd
      
      * Split device_reduce_instance_xxx.cpp files according to the data types to speed up compiling
      
      * Add comments for lists of configurations, lists of instances and references of add_reduce_instances_xxx
      
      * Remove un-used header file gridwise_generic_reduction_wrapper_common.hpp
      
      * Renaming and refining in the Reduction codes
      
      * Tiny change in the unary operators
      
      * Renaming symbols and files
      
      * Renaming symbols in the kernels
      
      * Move kernel kernel_set_buffer_value to separate file
      
      * Add IndexDataType template parameter for kernels and use int32_t as index data type in device layer
      
      * Tiny update in the kernels
      
      * Remove definition of sqrtf()/isnan()/abs() for half_t due to some ADL issue
      
      * Simplify a helper function in device layer
      
      * Tiny adjustment in testing data initialization
      
      * Renaming in kernel/device/host
      
      * Add two testing scripts for reduction
      
      * Refine the Unary operators in element_wise_operation.hpp
      
      * Update in the reduce profiler module
      
      * Update to the reduction testing scripts
      
      * reduce compile parallelism
      
      * change CI docker to rocm5.0
      
      * remove unused variables
      
      * fix build
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      e17c0d80
    • Chao Liu's avatar
      revert changes in threadwise copy due to PR #101 (space filling curve used in... · 12dfba3d
      Chao Liu authored
      revert changes in threadwise copy due to PR #101 (space filling curve used in threadwise copy) (#111)
      
      12dfba3d
    • Jing Zhang's avatar
      clean · 698573a9
      Jing Zhang authored
      698573a9
    • Jing Zhang's avatar
      perf test · 55ab4687
      Jing Zhang authored
      55ab4687
    • rocking5566's avatar
      Int8 qunatization gemm xdl (#108) · ad41aa0e
      rocking5566 authored
      
      
      * Add int8 of mk_nk_mn to the ckProfiler
      
      * Add example of int8 gemm
      
      * Fix typo, use ushort instead of half_t for bfloat16
      
      * replace ushortXXX_t to bhalfXXX_t
      
      * rename ushort to bhalf_t
      
      * Add bf16 example
      
      * Add bf16 gemm to ckProfiler
      
      * Fix alignment
      
      * Fix typo
      
      * Add unit test for gemm_xdl int8
      
      * Add gemm_xdl fp32 unit test
      
      * Add gemm_xdl bf16 unit test
      
      * fix build
      
      * fix build issue due to merge conflict
      
      * Fix build
      
      * Fix build error
      
      * [What] gemm + relu inference
      [How] gemm + requant + relu + requant + clamp
      
      * clean
      Co-authored-by: default avatarrocking <chunylai@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      ad41aa0e
    • Chao Liu's avatar
      Fix Tests build (#109) · 5b178874
      Chao Liu authored
      * fix tests
      
      * remove useless file
      
      * fix test build
      
      * reduce parallelism when compiling
      
      * fix test
      5b178874
    • Jing Zhang's avatar
      2 gemm test · bbe5c0c7
      Jing Zhang authored
      bbe5c0c7
    • ltqin's avatar
      Example for conv2d backward weight fp16 (#106) · 7a9b93f4
      ltqin authored
      
      
      * add wrw reference
      
      * start device
      
      * raw not split version
      
      * run simple example
      
      * start to use atomic add
      
      * simple transform result correct
      
      * first version that can run
      
      * fix atomic and set operator choice
      
      * add check split-k
      
      * format
      
      * change input parameter
      
      * add pad for t total
      
      * rename example index
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      7a9b93f4
    • Jing Zhang's avatar
      init of grouped_gemm · 6cbb0a13
      Jing Zhang authored
      6cbb0a13
  10. 04 Mar, 2022 4 commits
    • rocking5566's avatar
      [Bf16 & int8] [example & ckprofiler] (#100) · 7e9a9d32
      rocking5566 authored
      
      
      * Add int8 of mk_nk_mn to the ckProfiler
      
      * Add example of int8 gemm
      
      * Fix typo, use ushort instead of half_t for bfloat16
      
      * replace ushortXXX_t to bhalfXXX_t
      
      * rename ushort to bhalf_t
      
      * Add bf16 example
      
      * Add bf16 gemm to ckProfiler
      
      * Fix alignment
      
      * Fix typo
      
      * Add unit test for gemm_xdl int8
      
      * Add gemm_xdl fp32 unit test
      
      * Add gemm_xdl bf16 unit test
      
      * fix build
      
      * fix build issue due to merge conflict
      
      * Fix build
      
      * Fix build error
      Co-authored-by: default avatarrocking <chunylai@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      7e9a9d32
    • Chao Liu's avatar
      fix type in PR #101 (#107) · 0c79af12
      Chao Liu authored
      0c79af12
    • Jianfeng Yan's avatar
      Refactor threadwise copy using sfcurve (#101) · 0619ebf7
      Jianfeng Yan authored
      
      
      * add space_filling_curve
      
      * cleanup and move space_filling_curve into test
      
      * WIP: start refactoring threadwise_transfer_v1r3
      
      * threadwise_copy works but needs further refactoring
      
      * add some comments
      
      * add SpaceFillingCurve::GetIndices()
      
      * minor changes
      
      * removed GetIndices; refactored GetDstCoordinateResetStep
      
      * add DynamicBuffer::Transfer, but Add is not tested
      
      * rebased agaist develop
      
      * threadwise_copy_v6r1/v6r2/v6r3 using space-filling curve start to work
      
      * minor changes
      
      * refactored threadcopy v3r1, v2; removed old implementations
      
      * clang-format
      
      * cleanup
      
      * fix a typo in v6r3
      
      * format
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      0619ebf7
    • ltqin's avatar
      NHWC conv 2d: bwd fp32/fp16/bfp16/int8, Device level tuning and host API (#92) · c254e5ab
      ltqin authored
      
      
      * start conv2d bwd api
      
      * kernel running
      
      * add bwd reference
      
      * change to no shuffle
      
      * fix bwd reference
      
      * pass verification
      
      * add Filter1x1Stride1Pad0 and start testing
      
      * change some tuning parameter
      
      * fix test error
      
      * add fp16 tuning parameter
      
      * add bf16 tuning parameter
      
      * add int8 tuning parameters
      
      * change fp32 tuning parameter
      
      * add bwd to profiler
      
      * fix bug for bwd profiler
      
      * fix ckProfiler bug
      
      * change conv2d_bwd_xdl to fp16
      
      * fix bug in comments
      
      * fix precompile id
      
      * fix enum conv name
      
      * chage _bwd_ to _bwd_data_
      
      * change conv2d_bwd example id
      
      * bwd to bwd data
      
      * fix prehead
      
      * fix MakeDefaultBlock2CTileMap ,import form merge develop
      
      * format bwd instance
      
      * bwd to bwd data
      
      * change name bwd to bwd data
      
      * change name bwd to bwd data in example
      
      * formate code
      
      * change conv2d bwd data id in example
      
      * rewrite readme for example
      
      * fix CalculateMagicNumbers about div zero
      
      * add workaround CK_WORKAROUND_SWDEV_325164
      
      * change test_conf2d_bwd_data show info
      
      * format
      
      * fix bug for workaround:CK_WORKAROUND_SWDEV_325164
      
      * formate tuning parameters
      
      * formate tuning parameters again
      
      * formate tuning parameters 3
      
      * formate tuning parameters 4
      
      * remove add function template
      
      * format
      
      * update comment
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      c254e5ab
  11. 03 Mar, 2022 1 commit
    • JD's avatar
      Update test CMakeLists to add new tests automatically and add Jenkins stage for tests (#88) · 992f71e3
      JD authored
      
      
      * add docker file and make default target buildable
      
      * add Jenkinsfile
      
      * remove empty env block
      
      * fix package stage
      
      * remove render group from docker run
      
      * clean up Jenkins file
      
      * add cppcheck as dev dependency
      
      * update cmake file
      
      * Add profiler build stage
      
      * add hip_version config file for reduction operator
      
      * correct jenkins var name
      
      * Build release instead of debug
      
      * Update test CMakeLists.txt
      reorg test dir
      add test stage
      
      * reduce compile threads to prevent compiler crash
      
      * add optional debug stage, update second test
      
      * remove old test target
      
      * fix tests to return proper results and self review
      
      * Fix package name and make test run without args
      
      * change Dockerfile to ues rocm4.3.1
      
      * remove parallelism from build
      
      * Lower paralellism
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      992f71e3