1. 16 Feb, 2023 4 commits
  2. 15 Feb, 2023 7 commits
    • pmaybank's avatar
      Sphinx doc (#581) · cb3fac4d
      pmaybank authored
      
      
      * New docs directory with minimal config
      
      * Based on docs directory of rocBLAS
      
      * Config for running Doxygen then Sphinx to generate HTML
      
      * Add minimal content - intro to doc
      
      * Add some boilerplate sections to doc
      
      * content still needs to be done,
      * e.g., need to generate API documentation using Doxygen
      * need to write contributor guide
      
      * Start Softmax section of Support Primitives doc
      
      * Written as a test bed for typesetting math content
      
      * Need to decide how much detail to go into
      
      * add doc directories to git ignore file.
      
      * Minor edits - new line at EOF, change year in copyright notices
      
      * Port Markdown files to ReStructuredText
      
      * Copy Markdown files from pre-existing doc directory to docs directory
      
      * Convert to reStructured Text (rst) - section headings, links, tables
        have a different syntax in rst
      
      * New rst files added to index - can generate HTML with same style as
        HTML generated from rst files in previous commits
      
      * Intention is to make all the content in doc redundant and use rst
        throughout rather than mix of md and rst
      
      * Extend Softmax section of Primitives Guide
      
      * rename l to z
      
      * add material on applying softmax row-wise to matrix
      
      * define macro for diag operator (represents diagonal matrix)
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      cb3fac4d
    • Illia Silin's avatar
      Clean up kernel launch output (#569) · 19490ac4
      Illia Silin authored
      
      
      * clean up output from kernel_launch
      
      * set RUN_WARMUP to 0 by default
      
      * split the warm-up into a separate issue
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      19490ac4
    • zjing14's avatar
      Add contraction_fp64 example (#570) · 24c9ee1d
      zjing14 authored
      
      
      * add contraction_bilinear
      
      * add contraction_scale_xdl_fp64
      
      * reduce tile size to avoid register spill
      
      ---------
      Co-authored-by: default avatarroot <root@ctr-ubbsmc16.amd.com>
      24c9ee1d
    • rocking5566's avatar
      Improve normalization (#580) · 6a6163a3
      rocking5566 authored
      * Sync the order of type string with template parameter
      
      * Add more instances
      
      * Check the vector size and remove redundant var
      
      * Extract var to static, prepare to separate sweep once kernel
      
      * Separate sweeponce flow and optimize the flow
      
      * 1. Rename AccDatatype in normalization to computeData
      2. Rename AccElementwiseOperation to YElementwiseOperation in normalization
      
      * Remove useless code
      
      * Update naive variance kernel
      
      * Refine string
      
      * Fix typo
      
      * Support naive variance for device_normalization
      
      * Check the blocksize
      
      * Share the VGPR of x and y
      
      * Share the VGPR of gamma and beta
      
      * Add more instances
      
      * Support fp16 sqrt for experiment
      
      * Add CHANGELOG
      
      * Fix typo
      
      * clang-format
      6a6163a3
    • Haocong WANG's avatar
      [Navi3x] Add Device Operations (#567) · 0cfda84d
      Haocong WANG authored
      * wmma_op + unit test
      
      * add arch limitation to wmma test
      
      * change arch limitation
      
      * Refactor + Add all type unit test(int4 compile failed)
      
      * Add f32_16x16x16_bf16 unit test
      
      * tempsave
      
      * tempsave
      
      * tempsave
      
      * runtime bug, cannot find symbol
      
      * workaround for incorrect HIP warpSize return value
      
      * debugging
      
      * tempsave
      
      * Correctness OK, waiting for optimization
      
      * Tidy up + format
      
      * temp save
      
      * temp save, reproduce the v_bfi_b32 issue
      
      * add inline asm for wmmaop test
      
      * tidy up
      
      * clean some debug purpose code
      
      * discard some codes
      
      * clang format
      
      * clang format
      
      * compiler issue fixed + increase tile size
      
      * navi3x_multipleD+example
      
      * temp save
      
      * workable
      
      * batchedgemm[OK], groupconv[debug]
      
      * groupconv: Sanity check[OK], Performance[Bad]
      
      * navi3x_groupconv_need_optimization
      
      * format
      
      * Add arch limitation to all wmma examples
      
      * fix bug: example30 input conv args
      0cfda84d
    • Adam Osewski's avatar
      Conv3D FWD BWD WRW fp16 fp32 client examples (#559) · e9fd1228
      Adam Osewski authored
      
      
      * Conv3d bwd weight client example.
      
      * Update year in license
      
      * Convolution bwd data 3D fp16/fp32 client example.
      
      * Client example for convnd fwd fp16 fp32
      
      * clang-format
      
      * Review remarks.
      
      * Fix compiler err.
      
      * Update data layout to standard one.
      
      * Add conv 3d fwd NDHWGC instances
      
      * clang-format
      
      * Conv3d fwd NDHWGC instances.
      
      ---------
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      e9fd1228
    • Illia Silin's avatar
      Remove the workaround for bf16 attention tests. (#586) · 06f1fc86
      Illia Silin authored
      * remove workanround in bf16 attention test
      
      * clean up another workaround
      06f1fc86
  3. 14 Feb, 2023 1 commit
  4. 13 Feb, 2023 1 commit
  5. 11 Feb, 2023 1 commit
  6. 10 Feb, 2023 1 commit
  7. 09 Feb, 2023 3 commits
    • rocking5566's avatar
      Gemm+layernorm instance, ckProfiler, client example (#568) · f7d28f3e
      rocking5566 authored
      * Add gemm + layernorm instance
      
      * Add ckProfiler
      
      * Add test
      
      * Add client example
      
      * Detect if user forger to set the workrspace
      
      * Use literal in the example
      
      * [What] use builtin function for sqrt
      [Why] compiler will not use v_sqrt_f64_e64 if we use ::sqrt()
      
      * check gemm vaildity in IsSupportedArgument
      
      * Add more testcases
      
      * Merge duplicated folder in client example
      
      * Print more infomation
      
      * Use better kernel parameter for MS problem size
      
      * clang format
      
      * Add constexpr for if condition and remove redundant include
      
      * Remove cstdlib and add constexpr
      f7d28f3e
    • guangzlu's avatar
      Add instance for elementwise normlization (#573) · 76d144fa
      guangzlu authored
      * added instances for large N
      
      * add instance for elementwise normlization
      
      * added supported restrict in device_elementwise_normalization_impl.hpp
      76d144fa
    • aska-0096's avatar
      Add Inter-Row thread transfer · a6b2f1c1
      aska-0096 authored
      a6b2f1c1
  8. 08 Feb, 2023 3 commits
    • Illia Silin's avatar
      adding the first draft of changelog (#571) · b63accee
      Illia Silin authored
      * adding the first draft of changelog
      
      * second draft of changelog
      b63accee
    • ltqin's avatar
      Add GemmAddSoftmaxGemm support for MSFT ORT (instances and client API) (#576) · 332ccc33
      ltqin authored
      * add instance for gemm bias softmax gemm
      
      * add client example
      
      * change CGridDesc_G_M_N to CGridDesc_G_M_O
      
      * add gridwise
      
      * change c grid name
      
      * device add d0s data
      
      * fix 08 client_example
      
      * add example 47_fused_attention
      
      * example output correct
      
      * add d0 to example
      
      * add d0 element op
      
      * rechange instance code
      
      * change Acc0ElementwiseOperation to C0DEElementwiseOperation
      
      * change example name
      
      * update instance for cdeelementwiseop
      
      * add bhalf_t ScaleAdd
      
      * add test
      
      * not surport geem1 bias
      
      * remove some ignore
      
      * fix test bug
      332ccc33
    • Illia Silin's avatar
      Fix a couple more CI issues. (#578) · bb3d9546
      Illia Silin authored
      * test the QA cron parameter for compiler commit
      
      * create separate dockers for latest and fixed amd-stg-open compiler versions
      
      * change groovy syntax
      
      * apply cron timers back to develop branch
      bb3d9546
  9. 06 Feb, 2023 1 commit
    • Illia Silin's avatar
      Fix CI issues. (#572) · f73574ff
      Illia Silin authored
      * switch to recent staging compiler as default for CI
      
      * fix the baseline query
      
      * roll back sqlalchemy to version 1.4.46
      f73574ff
  10. 03 Feb, 2023 1 commit
  11. 01 Feb, 2023 1 commit
  12. 31 Jan, 2023 1 commit
  13. 30 Jan, 2023 2 commits
  14. 26 Jan, 2023 1 commit
  15. 25 Jan, 2023 1 commit
    • Qianfeng's avatar
      Batchnorm inference instances, external API, client examples and gtests (#531) · a1b2441f
      Qianfeng authored
      * File renaming and class renaming for device element-wise operation
      
      * Add batchnorm-infer instances, external API and client example
      
      * Add batchnorm-infer profiler module and gtests
      
      * Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp
      
      * Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer
      
      * Rename class and file due to conflict from device_elementwise_2d.hpp
      
      * Fix namespace in batcnnorm_infer_nhwc client example
      a1b2441f
  16. 19 Jan, 2023 1 commit
  17. 18 Jan, 2023 9 commits
  18. 17 Jan, 2023 1 commit
    • Qianfeng's avatar
      Reduction external API and client examples (#493) · 80e05267
      Qianfeng authored
      
      
      * Change to the DeviceReduce base class template to include all problem description information
      
      * Add external api for reduction
      
      * Add client example to test the reduction external api
      
      * Spelling correction
      
      * Re-implement the host_reduction to follow the DeviceReduce base API format
      
      * Change the reduce profiler to call the external API for collecting device instances
      
      * Rename reduce client example directory from 08_reduce to 12_reduce
      
      * Remove (void) before the functional call
      
      * Tiny update in reduce client example
      
      * Tiny update in profile_reduce_impl.hpp
      
      * Rename the reduce client example directory
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      80e05267