- 27 Jan, 2026 11 commits
-
-
Jiacheng Huang authored
对 `NineToothedTensor` 进行 C++ 层封装 加入使用数组作为 `shape` 和 `strides` 创建 `ninetoothed::Tensor` 的方式 使用 `ninetoothed::Tensor` 接入九齿的 ReLU 算子 Add an include guard to `ninetoothed/utils.h`
-
PanZezhong authored
-
wooway777 authored
-
wooway777 authored
-
wooway777 authored
-
PanZezhong authored
-
wooway777 authored
-
wooway777 authored
-
wooway777 authored
-
gongchensu authored
- Ensure embedding tensors are on the same device. Change format. - Optimize embedding kernel with vectorized memory access and __ldg - Add vectorized memory access using float4/float2, half2, and bfloat162 - Use __ldg instruction for read-only weight and indices access - Add memory alignment checks to enable vectorized paths - Add __restrict__ keywords for better compiler optimization - Implement dynamic block size selection based on embedding_dim
-
wooway777 authored
-
- 22 Jan, 2026 1 commit
-
-
PanZezhong authored
-
- 21 Jan, 2026 2 commits
-
-
PanZezhong authored
-
PanZezhong authored
-
- 19 Jan, 2026 1 commit
-
-
PanZezhong authored
-
- 15 Jan, 2026 1 commit
-
-
PanZezhong authored
-
- 14 Jan, 2026 1 commit
-
-
PanZezhong authored
-
- 12 Jan, 2026 2 commits
-
-
PanZezhong authored
-
PanZezhong authored
-
- 09 Jan, 2026 2 commits
-
-
PanZezhong authored
-
PanZezhong authored
-
- 08 Jan, 2026 1 commit
-
-
zhushuang authored
-
- 06 Jan, 2026 1 commit
-
-
PanZezhong authored
-
- 30 Dec, 2025 3 commits
-
-
PanZezhong authored
-
PanZezhong authored
-
zhushuang authored
-
- 29 Dec, 2025 2 commits
-
-
pengcheng888 authored
-
zhushuang authored
-
- 26 Dec, 2025 3 commits
-
-
qinyiqun authored
* can commit * can exec sm_90a * can exec < sm_90 * fix format * fix format * 增加测试,测试对标sglang test * fix format 1 * fix format 2 * add compile option to disable cutlass
-
PanZezhong authored
-
PanZezhong1725 authored
This reverts commit 25258029.
-
- 25 Dec, 2025 2 commits
- 24 Dec, 2025 3 commits
-
-
zhuyue authored
-
zhuyue authored
-
PanZezhong authored
-
- 22 Dec, 2025 1 commit
-
-
PanZezhong authored
-
- 19 Dec, 2025 1 commit
-
-
pengcheng888 authored
-
- 18 Dec, 2025 1 commit
-
-
wooway777 authored
-
- 17 Dec, 2025 1 commit
-
-
zhuyue authored
-