"docs/source/dev/engine/async_llm_engine.rst" did not exist on "50376faa7b7397f82f9b67d7b6e0770ab189b6c1"
  1. 26 Jul, 2023 2 commits
    • carlushuang's avatar
      initial stream-k implementation with example (#699) · e7dca79d
      carlushuang authored
      
      
      * initial stream-k implementation with example
      
      * fix unexpected change in err
      
      * improve a little bit performance by reorganize pipeline.
      
      * improve perf a little bit by swizzle block idx
      
      * add profiler
      
      * update example
      
      * fix spelling
      
      * shrink karg for streamk
      
      * support dynamic buffer using memory coherence glc_slc bit from template
      
      * control memory coherence while construct dynamic buffer
      
      * update reduction for streamk(not ready yet)
      
      * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting
      
      * fix build issue
      
      * fix several bug
      
      * now result is correct, everything works (but has scratch)
      
      * remove scratch by manually reset coordinate
      
      * update device code
      
      * fix a bug in final reduce
      
      * fix something in example
      
      * update async memset
      
      * fix enum as camel case
      
      * modify coherence enum name
      
      * clean code and use atomic streamk by default
      
      * remove unused var
      
      * throw exception if have empty pointer
      
      * fix format
      
      * fix CI warning
      
      * fix type in init
      
      * modify CI error
      
      * filter out on gfx10+
      
      * restore changed example code
      
      ---------
      Co-authored-by: default avatarQianfeng Zhang <Qianfeng.Zhang@amd.com>
      e7dca79d
    • Illia Silin's avatar
      Disable DL kernels by default. (#816) · 9195435c
      Illia Silin authored
      9195435c
  2. 18 Jul, 2023 1 commit
  3. 21 Jun, 2023 1 commit
  4. 12 Jun, 2023 1 commit
    • Bartłomiej Kocot's avatar
      Add DeviceBatchedGemmMultipleD_Dl (#732) · fc9f9756
      Bartłomiej Kocot authored
      * Add DeviceBatchedGemmMultipleD_Dl
      
      * Fix batched_gemm tests
      
      * Fix comments
      
      * test_batched_gemm_multi_d fixes
      
      * Fix args for isSupported batchedGemmMultipleDDl
      
      * Disable tests for gfx90a
      fc9f9756
  5. 24 May, 2023 1 commit
    • rocking's avatar
      Pool3d fwd (#697) · 76ec0089
      rocking authored
      * Expand the base class of pool2d, prepare to share base class with pool3d
      
      * Add pool3d device op
      
      * Add pool3d f16 example
      
      * Refactor the base class. implement generic pooling in the future
      
      * clang format
      
      * get original index in max pooling
      
      * Add outputindex to base class
      
      * Fix dimension
      
      * Add pooling instance
      
      * Use indexType instead
      
      * Remove useless header
      
      * Extract IndexDataType to template
      
      * Extract pooling reference code
      
      * clang format
      
      * clang format
      
      * Fix typo
      
      * Add tensor stride
      
      * Add missing header
      
      * Add index stride and output stride
      
      * Refine naming
      
      * Add type to base class
      
      * Rename file
      
      * Use proper size
      
      * Fix typo
      
      * Refine naming
      
      * Modify the argument into vector.
      
      * Add max pool profiler
      
      * Refine naming
      
      * Support f32 pool
      
      * Fix typo
      
      * Add avg pool2d fwd in profiler
      
      * clang format
      
      * Rename AccDatatype to ComputeDatatype
      
      * Fix init
      
      * test pool
      
      * Extract variable
      
      * Add client example
      
      * Check the pooling dim
      
      * clang format
      
      * Connect argv and arg_parser
      
      * Add found check
      
      * Remove useless header
      
      * Refine naming
      
      * Adjust the order of device_pool_fwd
      76ec0089
  6. 15 May, 2023 1 commit
    • Bartłomiej Kocot's avatar
      Add contraction profiler and tests (#701) · 642d5e91
      Bartłomiej Kocot authored
      * Add contraction profiler and tests
      
      * Build and style fixes
      
      * Allow to use any elementwise operator for ref_contraction
      
      * Introduce profile_contraction_scale and profile_contraction_bilinear
      
      * Make ref_contraction generic and extend interface tests
      
      * Stylistic minor fixes
      
      * Extend test_contraction_interface
      642d5e91
  7. 08 Mar, 2023 1 commit
    • Adam Osewski's avatar
      GroupedGEMM + Gelu client example/instances/profiler (#614) · 9096b1c7
      Adam Osewski authored
      
      
      * Grouped gemm + Gelu instances.
      
      * Device Instance Factory for GroupedGemm+Gelu
      
      * Client example
      
      * Rangify fill helper functions.
      
      * Fix name clash.
      
      * Profiler for grouped_gemm+gelu
      
      * No need to use full namespace name.
      
      * Add check for MRaw divisible by vector load.
      
      * Ugly fix for big errors.
      
      * Add grouped_gemm+gelu to profiler CMakelists.
      
      * Store in argument additional info.
      
      * Information about Mraw, Nraw, Kraw values.
      
      * Use FastGelu instead of Gelu.
      
      * Change client ex to use FastGelu
      
      * Remove relaxed error precision.
      
      * Remove duplicate output elementwise-op
      
      ---------
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      9096b1c7
  8. 09 Feb, 2023 1 commit
    • rocking5566's avatar
      Gemm+layernorm instance, ckProfiler, client example (#568) · f7d28f3e
      rocking5566 authored
      * Add gemm + layernorm instance
      
      * Add ckProfiler
      
      * Add test
      
      * Add client example
      
      * Detect if user forger to set the workrspace
      
      * Use literal in the example
      
      * [What] use builtin function for sqrt
      [Why] compiler will not use v_sqrt_f64_e64 if we use ::sqrt()
      
      * check gemm vaildity in IsSupportedArgument
      
      * Add more testcases
      
      * Merge duplicated folder in client example
      
      * Print more infomation
      
      * Use better kernel parameter for MS problem size
      
      * clang format
      
      * Add constexpr for if condition and remove redundant include
      
      * Remove cstdlib and add constexpr
      f7d28f3e
  9. 25 Jan, 2023 1 commit
    • Qianfeng's avatar
      Batchnorm inference instances, external API, client examples and gtests (#531) · a1b2441f
      Qianfeng authored
      * File renaming and class renaming for device element-wise operation
      
      * Add batchnorm-infer instances, external API and client example
      
      * Add batchnorm-infer profiler module and gtests
      
      * Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp
      
      * Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer
      
      * Rename class and file due to conflict from device_elementwise_2d.hpp
      
      * Fix namespace in batcnnorm_infer_nhwc client example
      a1b2441f
  10. 18 Jan, 2023 1 commit
    • ltqin's avatar
      Add multiD Gemm client APIs (#534) · d66421fe
      ltqin authored
      
      
      * start add example
      
      * fix config
      
      * fix showinfo bug
      
      * add an elementop
      
      * change to padding
      
      * add xdl example
      
      * change elementwiseop
      
      * add instance
      
      * add instance to profiler
      
      * change file name
      
      * fix deive not support issue
      
      * add client example
      
      * fix client gemm_add_multiply name
      
      * change AddMultiply elementwiseop
      
      * fix elementwiseop
      
      * fix client example
      
      * fix addmultiply op
      
      * fix comments and fun name
      Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
      d66421fe
  11. 15 Dec, 2022 1 commit
    • Rostyslav Geyyer's avatar
      Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances to enable... · 9a1f2475
      Rostyslav Geyyer authored
      Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances to enable arbitrary problem size (#535)
      
      * Add padding device_gemm_add_add_fastgelu_xdl_c_shuffle instances
      
      * Add padding device_gemm_add_fastgelu_xdl_c_shuffle instances
      
      * Add gemm_add_fastgelu profiler impl
      
      * Add padding device_gemm_fastgelu_xdl_c_shuffle instances
      
      * Add gemm_fastgelu profiler impl
      9a1f2475
  12. 01 Dec, 2022 1 commit
    • Po Yen Chen's avatar
      Modularize ckProfiler operations (#514) · 8784a72e
      Po Yen Chen authored
      
      
      * Re-structure ckProfiler source files
      
      * Rename profiler.cpp to main.cpp
      
      * Modularize ckProfiler operations
      
      * Add description for profiler operations
      
      * Use longer name to avoid name collision
      
      * Use macro to delay expansion
      
      * Use std::move() to avoid object copying
      
      * Prohibit users from calling dtor
      
      * Use macro to eliminate redundant code
      
      * Make friend function hidden
      
      * Add missing include directive <iostream>
      
      * Fix wrong include directives
      
      * Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
      Co-authored-by: default avatarQianfeng Zhang <Qianfeng.Zhang@amd.com>
      8784a72e