1. 27 Jan, 2026 1 commit
    • gongchensu's avatar
      issue/846 - Refactor embedding to support device-side input and CUDA graph recording · cc2cc3a1
      gongchensu authored
      - Ensure embedding tensors are on the same device. Change format.
      - Optimize embedding kernel with vectorized memory access and __ldg
      - Add vectorized memory access using float4/float2, half2, and bfloat162
      - Use __ldg instruction for read-only weight and indices access
      - Add memory alignment checks to enable vectorized paths
      - Add __restrict__ keywords for better compiler optimization
      - Implement dynamic block size selection based on embedding_dim
      cc2cc3a1
  2. 12 Jan, 2026 2 commits
  3. 09 Jan, 2026 1 commit
  4. 08 Jan, 2026 1 commit
  5. 30 Dec, 2025 2 commits
  6. 29 Dec, 2025 1 commit
  7. 26 Dec, 2025 2 commits
  8. 25 Dec, 2025 2 commits
  9. 24 Dec, 2025 2 commits
  10. 19 Dec, 2025 1 commit
  11. 11 Dec, 2025 2 commits
  12. 10 Dec, 2025 2 commits
  13. 08 Dec, 2025 1 commit
  14. 04 Dec, 2025 1 commit
  15. 29 Nov, 2025 1 commit
  16. 28 Nov, 2025 3 commits
  17. 26 Nov, 2025 1 commit
  18. 22 Nov, 2025 1 commit
  19. 21 Nov, 2025 5 commits
  20. 20 Nov, 2025 1 commit
    • crapromer's avatar
      Issue/445 沐曦平台添加macaSDK支持 (#468) · ed012302
      crapromer authored
      * initial add mc support for meta
      
      * add command description for maca compilation
      
      * rebase metax maca support to main
      
      * issue/445 - clang format code on ubuntu
      
      * issue//445 - change config from use_mc to use-mc and format code
      ed012302
  21. 19 Nov, 2025 1 commit
  22. 07 Nov, 2025 1 commit
  23. 28 Oct, 2025 1 commit
  24. 23 Oct, 2025 1 commit
  25. 22 Oct, 2025 1 commit
  26. 16 Oct, 2025 1 commit
  27. 29 Sep, 2025 1 commit