1. 02 Mar, 2023 1 commit
    • Illia Silin's avatar
      Change the CI workflow. (#611) · e6cda9f8
      Illia Silin authored
      * add new parallel stage on navi node
      
      * dont run performance tests on navi, get rid of 9110 compiler
      
      * only run navi build when not doing QA
      
      * fix syntax
      
      * use navi21 label
      
      * dont stash profiler on navi nodes, scp deb package to ginger
      
      * disable tests on navi nodes
      
      * test posting a binary to ginger
      
      * add sshpass and use it to copy deb package
      
      * fix the scp example
      
      * fix syntax
      
      * debug the scp issues
      
      * add jenkins user to docker
      
      * dont try whoami
      
      * change jenkins uid and add user with uid=1002
      
      * try scp from the last stage on micimaster
      
      * rename and stash the package, scp from micimaster
      e6cda9f8
  2. 01 Mar, 2023 2 commits
  3. 27 Feb, 2023 1 commit
  4. 24 Feb, 2023 1 commit
  5. 22 Feb, 2023 2 commits
    • Rostyslav Geyyer's avatar
      Add Grouped Conv Backward Weight on Navi21 for ResNet50. (#505) · 246ceee4
      Rostyslav Geyyer authored
      
      
      * Add DeviceOp and examples
      
      * Format DeviceOp template arguments
      
      * Remove bf16 example
      
      * Format
      
      * Format
      
      * Update MakeABCGridDescriptor_A_K0_M_K1_B_K0_N_K1_C_M_N
      
      * Refactor argument preparation
      
      * Update conv_bwd_weight_dl to grouped_conv_bwd_weight_dl
      
      * Rename device op file
      
      * Update include directive in the example file
      
      * Update descriptor preparation for grouped op
      
      * Update the argument
      
      * Update batch handling
      
      * Add gridwise gemm supporting batched input
      
      * Update blockwise indexing, working version
      
      * Update copyright year
      
      * Update check if argument is supported
      
      * Refactor and make consistent with xdl examples
      
      * Update check if argument is supported
      
      * Add changelog entry
      
      * Added comments on Dl op split_k>1 support
      
      ---------
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      246ceee4
    • ltqin's avatar
      Grouped conv1d client example (#589) · 830d37a7
      ltqin authored
      
      
      * add conv1d fwd client example
      
      * change 07_grouped_conv2d_fwd to 07_grouped_convnd_fwd
      
      * add conv1d bwd weight
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      830d37a7
  6. 16 Feb, 2023 2 commits
  7. 15 Feb, 2023 7 commits
    • pmaybank's avatar
      Sphinx doc (#581) · cb3fac4d
      pmaybank authored
      
      
      * New docs directory with minimal config
      
      * Based on docs directory of rocBLAS
      
      * Config for running Doxygen then Sphinx to generate HTML
      
      * Add minimal content - intro to doc
      
      * Add some boilerplate sections to doc
      
      * content still needs to be done,
      * e.g., need to generate API documentation using Doxygen
      * need to write contributor guide
      
      * Start Softmax section of Support Primitives doc
      
      * Written as a test bed for typesetting math content
      
      * Need to decide how much detail to go into
      
      * add doc directories to git ignore file.
      
      * Minor edits - new line at EOF, change year in copyright notices
      
      * Port Markdown files to ReStructuredText
      
      * Copy Markdown files from pre-existing doc directory to docs directory
      
      * Convert to reStructured Text (rst) - section headings, links, tables
        have a different syntax in rst
      
      * New rst files added to index - can generate HTML with same style as
        HTML generated from rst files in previous commits
      
      * Intention is to make all the content in doc redundant and use rst
        throughout rather than mix of md and rst
      
      * Extend Softmax section of Primitives Guide
      
      * rename l to z
      
      * add material on applying softmax row-wise to matrix
      
      * define macro for diag operator (represents diagonal matrix)
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      cb3fac4d
    • Illia Silin's avatar
      Clean up kernel launch output (#569) · 19490ac4
      Illia Silin authored
      
      
      * clean up output from kernel_launch
      
      * set RUN_WARMUP to 0 by default
      
      * split the warm-up into a separate issue
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      19490ac4
    • zjing14's avatar
      Add contraction_fp64 example (#570) · 24c9ee1d
      zjing14 authored
      
      
      * add contraction_bilinear
      
      * add contraction_scale_xdl_fp64
      
      * reduce tile size to avoid register spill
      
      ---------
      Co-authored-by: default avatarroot <root@ctr-ubbsmc16.amd.com>
      24c9ee1d
    • rocking5566's avatar
      Improve normalization (#580) · 6a6163a3
      rocking5566 authored
      * Sync the order of type string with template parameter
      
      * Add more instances
      
      * Check the vector size and remove redundant var
      
      * Extract var to static, prepare to separate sweep once kernel
      
      * Separate sweeponce flow and optimize the flow
      
      * 1. Rename AccDatatype in normalization to computeData
      2. Rename AccElementwiseOperation to YElementwiseOperation in normalization
      
      * Remove useless code
      
      * Update naive variance kernel
      
      * Refine string
      
      * Fix typo
      
      * Support naive variance for device_normalization
      
      * Check the blocksize
      
      * Share the VGPR of x and y
      
      * Share the VGPR of gamma and beta
      
      * Add more instances
      
      * Support fp16 sqrt for experiment
      
      * Add CHANGELOG
      
      * Fix typo
      
      * clang-format
      6a6163a3
    • Haocong WANG's avatar
      [Navi3x] Add Device Operations (#567) · 0cfda84d
      Haocong WANG authored
      * wmma_op + unit test
      
      * add arch limitation to wmma test
      
      * change arch limitation
      
      * Refactor + Add all type unit test(int4 compile failed)
      
      * Add f32_16x16x16_bf16 unit test
      
      * tempsave
      
      * tempsave
      
      * tempsave
      
      * runtime bug, cannot find symbol
      
      * workaround for incorrect HIP warpSize return value
      
      * debugging
      
      * tempsave
      
      * Correctness OK, waiting for optimization
      
      * Tidy up + format
      
      * temp save
      
      * temp save, reproduce the v_bfi_b32 issue
      
      * add inline asm for wmmaop test
      
      * tidy up
      
      * clean some debug purpose code
      
      * discard some codes
      
      * clang format
      
      * clang format
      
      * compiler issue fixed + increase tile size
      
      * navi3x_multipleD+example
      
      * temp save
      
      * workable
      
      * batchedgemm[OK], groupconv[debug]
      
      * groupconv: Sanity check[OK], Performance[Bad]
      
      * navi3x_groupconv_need_optimization
      
      * format
      
      * Add arch limitation to all wmma examples
      
      * fix bug: example30 input conv args
      0cfda84d
    • Adam Osewski's avatar
      Conv3D FWD BWD WRW fp16 fp32 client examples (#559) · e9fd1228
      Adam Osewski authored
      
      
      * Conv3d bwd weight client example.
      
      * Update year in license
      
      * Convolution bwd data 3D fp16/fp32 client example.
      
      * Client example for convnd fwd fp16 fp32
      
      * clang-format
      
      * Review remarks.
      
      * Fix compiler err.
      
      * Update data layout to standard one.
      
      * Add conv 3d fwd NDHWGC instances
      
      * clang-format
      
      * Conv3d fwd NDHWGC instances.
      
      ---------
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      e9fd1228
    • Illia Silin's avatar
      Remove the workaround for bf16 attention tests. (#586) · 06f1fc86
      Illia Silin authored
      * remove workanround in bf16 attention test
      
      * clean up another workaround
      06f1fc86
  8. 13 Feb, 2023 1 commit
  9. 10 Feb, 2023 1 commit
  10. 09 Feb, 2023 2 commits
    • rocking5566's avatar
      Gemm+layernorm instance, ckProfiler, client example (#568) · f7d28f3e
      rocking5566 authored
      * Add gemm + layernorm instance
      
      * Add ckProfiler
      
      * Add test
      
      * Add client example
      
      * Detect if user forger to set the workrspace
      
      * Use literal in the example
      
      * [What] use builtin function for sqrt
      [Why] compiler will not use v_sqrt_f64_e64 if we use ::sqrt()
      
      * check gemm vaildity in IsSupportedArgument
      
      * Add more testcases
      
      * Merge duplicated folder in client example
      
      * Print more infomation
      
      * Use better kernel parameter for MS problem size
      
      * clang format
      
      * Add constexpr for if condition and remove redundant include
      
      * Remove cstdlib and add constexpr
      f7d28f3e
    • guangzlu's avatar
      Add instance for elementwise normlization (#573) · 76d144fa
      guangzlu authored
      * added instances for large N
      
      * add instance for elementwise normlization
      
      * added supported restrict in device_elementwise_normalization_impl.hpp
      76d144fa
  11. 08 Feb, 2023 3 commits
    • Illia Silin's avatar
      adding the first draft of changelog (#571) · b63accee
      Illia Silin authored
      * adding the first draft of changelog
      
      * second draft of changelog
      b63accee
    • ltqin's avatar
      Add GemmAddSoftmaxGemm support for MSFT ORT (instances and client API) (#576) · 332ccc33
      ltqin authored
      * add instance for gemm bias softmax gemm
      
      * add client example
      
      * change CGridDesc_G_M_N to CGridDesc_G_M_O
      
      * add gridwise
      
      * change c grid name
      
      * device add d0s data
      
      * fix 08 client_example
      
      * add example 47_fused_attention
      
      * example output correct
      
      * add d0 to example
      
      * add d0 element op
      
      * rechange instance code
      
      * change Acc0ElementwiseOperation to C0DEElementwiseOperation
      
      * change example name
      
      * update instance for cdeelementwiseop
      
      * add bhalf_t ScaleAdd
      
      * add test
      
      * not surport geem1 bias
      
      * remove some ignore
      
      * fix test bug
      332ccc33
    • Illia Silin's avatar
      Fix a couple more CI issues. (#578) · bb3d9546
      Illia Silin authored
      * test the QA cron parameter for compiler commit
      
      * create separate dockers for latest and fixed amd-stg-open compiler versions
      
      * change groovy syntax
      
      * apply cron timers back to develop branch
      bb3d9546
  12. 06 Feb, 2023 1 commit
    • Illia Silin's avatar
      Fix CI issues. (#572) · f73574ff
      Illia Silin authored
      * switch to recent staging compiler as default for CI
      
      * fix the baseline query
      
      * roll back sqlalchemy to version 1.4.46
      f73574ff
  13. 01 Feb, 2023 1 commit
  14. 31 Jan, 2023 1 commit
  15. 30 Jan, 2023 1 commit
  16. 26 Jan, 2023 1 commit
  17. 25 Jan, 2023 1 commit
    • Qianfeng's avatar
      Batchnorm inference instances, external API, client examples and gtests (#531) · a1b2441f
      Qianfeng authored
      * File renaming and class renaming for device element-wise operation
      
      * Add batchnorm-infer instances, external API and client example
      
      * Add batchnorm-infer profiler module and gtests
      
      * Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp
      
      * Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer
      
      * Rename class and file due to conflict from device_elementwise_2d.hpp
      
      * Fix namespace in batcnnorm_infer_nhwc client example
      a1b2441f
  18. 18 Jan, 2023 6 commits
    • Qianfeng's avatar
      Use double for all scaling values and float-point constant values at the Device Op API (#557) · 52abc2f3
      Qianfeng authored
      * Use double as alpha/beta values type in reduce device op api
      
      * Use double as alpha/beta values type in softmax device op api
      
      * Use double as alpha/beta values type in multiple-reduce device op api
      
      * Use double as epsilon value type in normalization/elementwise-normalization device op api
      52abc2f3
    • Raman R jana's avatar
      Wavelet (inter-wave consumer-producer) GEMM (#310) · 1cfa8760
      Raman R jana authored
      
      
      * wavelet gemm programming model support for CK
      
      * GEMM pipeline update for wavelet progrmmaing model
      
      * Updated wavelet programming pipeline
      
      * fixes for global-write for math-wave
      
      * fixed bug in global writes
      
      * Updated comments for better readability
      
      * fixed clang format errors
      
      * added block_lds without barrier sync
      
      * clean
      
      * clean
      
      * clean
      
      * clean
      
      * refactor
      
      * prototype
      
      4 layouts
      
      fix default stride
      
      all problem sizes
      
      tidy
      
      move file
      
      update build script
      
      restore old file
      
      fix build
      
      * refactor standalone test to use gemm test harness
      
      * simplify gemm test
      
      * update build script
      
      * remove redundant
      
      * early return when cmd arg doesn't match
      
      * tidy
      
      * report failure when result not validated
      
      * tidy
      
      * Add comment depicting B2C mapping pattern.
      
      * Formatting & comments.
      
      * Comparison with custom B2C mapping pattern.
      
      * Example for wavelet gemm.
      
      * Add wavelet to Gemm standalone test.
      
      * Remove debug code.
      
      * Remove dangling #endif directive.
      
      Co-authored-by: root <Raman Jana>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarAnthony Chang <ac.chang@outlook.com>
      Co-authored-by: default avatarAdam Osewski <19374865+aosewski@users.noreply.github.com>
      1cfa8760
    • ltqin's avatar
      Add multiD Gemm client APIs (#534) · d66421fe
      ltqin authored
      
      
      * start add example
      
      * fix config
      
      * fix showinfo bug
      
      * add an elementop
      
      * change to padding
      
      * add xdl example
      
      * change elementwiseop
      
      * add instance
      
      * add instance to profiler
      
      * change file name
      
      * fix deive not support issue
      
      * add client example
      
      * fix client gemm_add_multiply name
      
      * change AddMultiply elementwiseop
      
      * fix elementwiseop
      
      * fix client example
      
      * fix addmultiply op
      
      * fix comments and fun name
      Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
      d66421fe
    • Illia Silin's avatar
      fix a bug for 6-dim kernels (#555) · 00ff30af
      Illia Silin authored
      00ff30af
    • who who who's avatar
      add multi embeddings support (#542) · 147b7db5
      who who who authored
      * add multi embeddings support
      
      * fix format
      
      * optimize sqrt
      
      * add reduce operation
      
      * change to elementwise op
      
      * fix name
      
      * rename
      
      * run ci cd
      
      * format example
      
      * format code
      
      * format code
      147b7db5
    • ltqin's avatar
      Add client API/examples for 3xGemm+Bias+Add+Permute{0, 2, 3, 1} (#550) · 55236709
      ltqin authored
      * add example
      
      * fix example
      
      * add instance for gemm permute
      
      * add to client example
      
      * change configs
      
      * change instance file name
      
      * formate
      
      * change client example file name and remove example
      55236709
  19. 17 Jan, 2023 3 commits
    • Qianfeng's avatar
      Reduction external API and client examples (#493) · 80e05267
      Qianfeng authored
      
      
      * Change to the DeviceReduce base class template to include all problem description information
      
      * Add external api for reduction
      
      * Add client example to test the reduction external api
      
      * Spelling correction
      
      * Re-implement the host_reduction to follow the DeviceReduce base API format
      
      * Change the reduce profiler to call the external API for collecting device instances
      
      * Rename reduce client example directory from 08_reduce to 12_reduce
      
      * Remove (void) before the functional call
      
      * Tiny update in reduce client example
      
      * Tiny update in profile_reduce_impl.hpp
      
      * Rename the reduce client example directory
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      80e05267
    • rocking5566's avatar
      Gemm layernorm welford (#413) · 7829d729
      rocking5566 authored
      
      
      * Add device op of gemm layernorm
      
      * [What] Rename F to H
      [Why] F and G prepare for welford tensor
      
      * Add gridwise gemm + welford
      
      * Extract template parameter
      
      * Rename kernel. Prepare to add second half kernel
      
      * Extract var
      
      * Add second kernel for gemm+layernorm
      
      * Move to the gemm_layernorm folder
      
      * Rename F and G to mean and var
      
      * Do not use snakeCurved, it makes determination of padding  for welford difficult
      
      * Rewrite the device interface and rename some var
      
      * Add welford count
      
      * Update interface
      
      * Sync code, prepare to test on MI200
      
      * Clean the code
      
      * Implement layernorm
      
      * Add comment to mension hipFree
      
      * Wrtie out the e for debug.
      This could be remove and use h for instead
      
      * 1. Allocate mean, var and count into by SetWorkSpacePointer.
      2. Add GetWorkSpaceSize to calculate the space size
      
      * Add gemm layernorm host code
      
      * use reference layernorm
      
      * Fix bug of blockwise welford for first kernel
      
      * Fix bug of mean var padding for layernorm
      
      * Use sgpr for shuffleM_index
      
      * padding for GemmMeanVarCountGridDescriptor_M_NBlock
      
      * Add layout parameter
      
      * Check argument for gemm
      
      * calculate max count for tail block
      
      * Share E and H memory in device op
      
      * Hard code the vector dim
      
      * Refine the MakeDescriptor
      
      * 1. Remove E parameter, because E is inside of device op
      2. Check vector size
      
      * [What] Rename MakeMeanVarDescriptor_M_N
      [Why] Prepare to add count version of make descriptor
      
      * Use 1D global memory for count
      
      * Prevent redundant IO
      
      * Update parameter
      
      * Add pipeline v1/v2 selector
      
      * Rename the example name
      
      * Add base class for gemm layernorm
      
      * Refine naming to distinguish naive and welford
      
      * Add comment to explan in detail
      
      * We don't need to pad in N dimension in gemm for mean/var/count. Set NPerTile 1
      
      * Rewrite the 2st kernel, use multiple block along N dimension in layernorm kernel
      
      * Share the vector size
      
      * Refine var name
      
      * [What] Force LayernormThreadSliceSize_N = vector size.
      [Why] Memory coalesce
      
      * Add comment
      
      * Extract divisor out of the loop in reference layernorm
      
      * Pad different size for E and H in layernorm kernel according to different block tile
      
      * Refine naming
      
      * Refine naming
      
      * Prevent implicit cast
      
      * [What] use ck::math::sqrt instead of __builtin_amdgcn_sqrtf
      [Why] __builtin_amdgcn_sqrtf is only support float, double will cause casting
      
      * Cast only constant
      
      * Change of post shuffle thread descriptor
      
      * Add EMeanVarDataType parameter.
      
      * Merge the mean and var threadwise copy
      
      * Add missing index
      
      * Fix Typo
      
      * Sync the variable with previous if
      
      * 1. Declare e inside the host_gemm_layernorm()
      2. Prevent implicit cast in reference code
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      7829d729
    • Haocong WANG's avatar
      [Navi3x-LWPCK-545] Block-wise GEMM + Real GEMM_WMMA_FP16 (#541) · 919aeb1f
      Haocong WANG authored
      * wmma_op + unit test
      
      * add arch limitation to wmma test
      
      * change arch limitation
      
      * Refactor + Add all type unit test(int4 compile failed)
      
      * Add f32_16x16x16_bf16 unit test
      
      * tempsave
      
      * tempsave
      
      * tempsave
      
      * runtime bug, cannot find symbol
      
      * workaround for incorrect HIP warpSize return value
      
      * debugging
      
      * tempsave
      
      * Correctness OK, waiting for optimization
      
      * Tidy up + format
      
      * temp save
      
      * temp save, reproduce the v_bfi_b32 issue
      
      * add inline asm for wmmaop test
      
      * tidy up
      
      * clean some debug purpose code
      
      * discard some codes
      
      * clang format
      
      * clang format
      
      * compiler issue fixed + increase tile size
      919aeb1f
  20. 12 Jan, 2023 2 commits
    • Illia Silin's avatar
      Add a flag to enable/disable debug output in many kernels. (#549) · 715e8dd2
      Illia Silin authored
      * add DEBUG_LOG macro to enable/disable debug output
      
      * fix syntax
      
      * fix syntax again
      
      * fix syntax one more time
      
      * remove balnk spaces
      
      * use ifdefs
      
      * add the Print argument
      
      * move the definition of DEBUG_LOG to ck.hpp
      
      * add the missign argument to Print()
      715e8dd2
    • Qianfeng's avatar
      Remove including of cmath (#551) · a17b0414
      Qianfeng authored
      * Let cmath included when compiling host codes in math_v2.hpp
      
      * Remove including of cmath in device_base.hpp and device_permute.hpp
      a17b0414