"ml/git@developer.sourcefind.cn:OpenDAS/ollama.git" did not exist on "3b96a93672377129f2a2aafc447e79ef1ca48c5f"
  1. 28 Apr, 2023 1 commit
  2. 26 Apr, 2023 1 commit
  3. 24 Apr, 2023 1 commit
  4. 22 Apr, 2023 1 commit
  5. 16 Apr, 2023 2 commits
  6. 11 Apr, 2023 2 commits
  7. 10 Apr, 2023 1 commit
    • rocking5566's avatar
      Groupnorm + swish external api (#668) · ed3a2e52
      rocking5566 authored
      * Rename to proper naming
      
      * Add example of groupnorm + swish
      
      * Extract duplicate code in example
      
      * Add groupnorm + swish instances
      
      * Ractor instance generation, split into multiple cpp file
      
      * Add external api and client example
      
      * Refine profiler message
      
      * Use ck math version of exp
      
      * Refine problem size in example
      
      * Add host version of exp
      ed3a2e52
  8. 07 Apr, 2023 1 commit
  9. 30 Mar, 2023 2 commits
  10. 29 Mar, 2023 2 commits
  11. 23 Mar, 2023 1 commit
  12. 22 Mar, 2023 1 commit
  13. 20 Mar, 2023 2 commits
  14. 15 Mar, 2023 5 commits
  15. 10 Mar, 2023 2 commits
  16. 09 Mar, 2023 2 commits
  17. 08 Mar, 2023 1 commit
    • Adam Osewski's avatar
      GroupedGEMM + Gelu client example/instances/profiler (#614) · 9096b1c7
      Adam Osewski authored
      
      
      * Grouped gemm + Gelu instances.
      
      * Device Instance Factory for GroupedGemm+Gelu
      
      * Client example
      
      * Rangify fill helper functions.
      
      * Fix name clash.
      
      * Profiler for grouped_gemm+gelu
      
      * No need to use full namespace name.
      
      * Add check for MRaw divisible by vector load.
      
      * Ugly fix for big errors.
      
      * Add grouped_gemm+gelu to profiler CMakelists.
      
      * Store in argument additional info.
      
      * Information about Mraw, Nraw, Kraw values.
      
      * Use FastGelu instead of Gelu.
      
      * Change client ex to use FastGelu
      
      * Remove relaxed error precision.
      
      * Remove duplicate output elementwise-op
      
      ---------
      Co-authored-by: default avatarAdam Osewski <aosewski@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      9096b1c7
  18. 06 Mar, 2023 2 commits
  19. 01 Mar, 2023 1 commit
  20. 27 Feb, 2023 1 commit
  21. 24 Feb, 2023 1 commit
  22. 22 Feb, 2023 1 commit
    • Rostyslav Geyyer's avatar
      Add Grouped Conv Backward Weight on Navi21 for ResNet50. (#505) · 246ceee4
      Rostyslav Geyyer authored
      
      
      * Add DeviceOp and examples
      
      * Format DeviceOp template arguments
      
      * Remove bf16 example
      
      * Format
      
      * Format
      
      * Update MakeABCGridDescriptor_A_K0_M_K1_B_K0_N_K1_C_M_N
      
      * Refactor argument preparation
      
      * Update conv_bwd_weight_dl to grouped_conv_bwd_weight_dl
      
      * Rename device op file
      
      * Update include directive in the example file
      
      * Update descriptor preparation for grouped op
      
      * Update the argument
      
      * Update batch handling
      
      * Add gridwise gemm supporting batched input
      
      * Update blockwise indexing, working version
      
      * Update copyright year
      
      * Update check if argument is supported
      
      * Refactor and make consistent with xdl examples
      
      * Update check if argument is supported
      
      * Add changelog entry
      
      * Added comments on Dl op split_k>1 support
      
      ---------
      Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      246ceee4
  23. 16 Feb, 2023 1 commit
  24. 15 Feb, 2023 5 commits
    • Illia Silin's avatar
      Clean up kernel launch output (#569) · 19490ac4
      Illia Silin authored
      
      
      * clean up output from kernel_launch
      
      * set RUN_WARMUP to 0 by default
      
      * split the warm-up into a separate issue
      
      ---------
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      19490ac4
    • zjing14's avatar
      Add contraction_fp64 example (#570) · 24c9ee1d
      zjing14 authored
      
      
      * add contraction_bilinear
      
      * add contraction_scale_xdl_fp64
      
      * reduce tile size to avoid register spill
      
      ---------
      Co-authored-by: default avatarroot <root@ctr-ubbsmc16.amd.com>
      24c9ee1d
    • rocking5566's avatar
      Improve normalization (#580) · 6a6163a3
      rocking5566 authored
      * Sync the order of type string with template parameter
      
      * Add more instances
      
      * Check the vector size and remove redundant var
      
      * Extract var to static, prepare to separate sweep once kernel
      
      * Separate sweeponce flow and optimize the flow
      
      * 1. Rename AccDatatype in normalization to computeData
      2. Rename AccElementwiseOperation to YElementwiseOperation in normalization
      
      * Remove useless code
      
      * Update naive variance kernel
      
      * Refine string
      
      * Fix typo
      
      * Support naive variance for device_normalization
      
      * Check the blocksize
      
      * Share the VGPR of x and y
      
      * Share the VGPR of gamma and beta
      
      * Add more instances
      
      * Support fp16 sqrt for experiment
      
      * Add CHANGELOG
      
      * Fix typo
      
      * clang-format
      6a6163a3
    • Haocong WANG's avatar
      [Navi3x] Add Device Operations (#567) · 0cfda84d
      Haocong WANG authored
      * wmma_op + unit test
      
      * add arch limitation to wmma test
      
      * change arch limitation
      
      * Refactor + Add all type unit test(int4 compile failed)
      
      * Add f32_16x16x16_bf16 unit test
      
      * tempsave
      
      * tempsave
      
      * tempsave
      
      * runtime bug, cannot find symbol
      
      * workaround for incorrect HIP warpSize return value
      
      * debugging
      
      * tempsave
      
      * Correctness OK, waiting for optimization
      
      * Tidy up + format
      
      * temp save
      
      * temp save, reproduce the v_bfi_b32 issue
      
      * add inline asm for wmmaop test
      
      * tidy up
      
      * clean some debug purpose code
      
      * discard some codes
      
      * clang format
      
      * clang format
      
      * compiler issue fixed + increase tile size
      
      * navi3x_multipleD+example
      
      * temp save
      
      * workable
      
      * batchedgemm[OK], groupconv[debug]
      
      * groupconv: Sanity check[OK], Performance[Bad]
      
      * navi3x_groupconv_need_optimization
      
      * format
      
      * Add arch limitation to all wmma examples
      
      * fix bug: example30 input conv args
      0cfda84d
    • Illia Silin's avatar
      Remove the workaround for bf16 attention tests. (#586) · 06f1fc86
      Illia Silin authored
      * remove workanround in bf16 attention test
      
      * clean up another workaround
      06f1fc86