Commits · dce99862d09ba3bfcb05b7061d48b59b1f620a50 · jerrrrry / infinicore

05 Mar, 2026 1 commit
- issue/1033 - replace __C with __INFINI_C · b1ee0a8a
  wooway777 authored Mar 05, 2026
  
  b1ee0a8a
27 Jan, 2026 1 commit

issue/846 - Refactor embedding to support device-side input and CUDA graph recording · cc2cc3a1

gongchensu authored Dec 26, 2025

- Ensure embedding tensors are on the same device. Change format.
- Optimize embedding kernel with vectorized memory access and __ldg
- Add vectorized memory access using float4/float2, half2, and bfloat162
- Use __ldg instruction for read-only weight and indices access
- Add memory alignment checks to enable vectorized paths
- Add __restrict__ keywords for better compiler optimization
- Implement dynamic block size selection based on embedding_dim

cc2cc3a1