Commits · cc2cc3a1cb04c176f505604e5b5dd12ca2854185 · jerrrrry / infinicore

27 Jan, 2026 2 commits

issue/846 - Refactor embedding to support device-side input and CUDA graph recording · cc2cc3a1

gongchensu authored Dec 26, 2025

- Ensure embedding tensors are on the same device. Change format.
- Optimize embedding kernel with vectorized memory access and __ldg
- Add vectorized memory access using float4/float2, half2, and bfloat162
- Use __ldg instruction for read-only weight and indices access
- Add memory alignment checks to enable vectorized paths
- Add __restrict__ keywords for better compiler optimization
- Implement dynamic block size selection based on embedding_dim

cc2cc3a1

issue/978 - metax cuda graph impl and wrappings · 822a5341
wooway777 authored Jan 23, 2026

822a5341

21 Jan, 2026 1 commit
- issue/811 support cuda graph capture · 91f91f7c
  PanZezhong authored Jan 21, 2026
  
  91f91f7c
12 Jan, 2026 2 commits
- issue/867 fix cpu malloc · ef867cc8
  PanZezhong authored Jan 12, 2026
  
  ef867cc8
- issue/867 fix page caching api, paged attn support more head dims · 96551cb7
  PanZezhong authored Jan 12, 2026
  
  96551cb7
09 Jan, 2026 1 commit
- issue/867 pass total kv lens as paged attn args · 499b1dc6
  PanZezhong authored Jan 09, 2026
  
  499b1dc6
08 Jan, 2026 1 commit
- issue/867 - feat: adjust paged_attention_prefill interface naming · 0a2839a2
  zhushuang authored Jan 07, 2026
  
  0a2839a2
30 Dec, 2025 2 commits
- issue/847 paged attention prefill一段式接口 · 99b940b2
  PanZezhong authored Dec 30, 2025
  
  99b940b2
- issue/848 - feat: add paged attention prefill for nvidia gpu with test pass · 1ba0bcfa
  zhushuang authored Dec 30, 2025
  
  1ba0bcfa
29 Dec, 2025 1 commit
- issue/834 - feat: add paged attention for nvidia gpu with test pass · 17299923
  zhushuang authored Dec 29, 2025
  
  17299923
26 Dec, 2025 2 commits

Issue/840: 英伟达支持Int8 Gemm (#852) · ed04d3e6

qinyiqun authored Dec 26, 2025

* can commit

* can exec sm_90a

* can exec < sm_90

* fix format

* fix format

* 增加测试，测试对标sglang test

* fix format 1

* fix format 2

* add compile option to disable cutlass

ed04d3e6

Revert "Issue/840: 英伟达Int8 Gemm (#841)" · 458bb997
PanZezhong1725 authored Dec 26, 2025
```
This reverts commit 25258029.
```
458bb997

25 Dec, 2025 2 commits
- Issue/840: 英伟达Int8 Gemm (#841) · 25258029
  qinyiqun authored Dec 25, 2025
  
  25258029
- Add NVIDIA GPU implementation for add_rms_norm and make residual_out required. · 7712471f
  zhuyue authored Dec 24, 2025
  
  7712471f
24 Dec, 2025 2 commits
- Unify add_rms_norm to always return (normalized_result, add_result) pair. · 2a432b34
  zhuyue authored Dec 24, 2025
  
  2a432b34
- 增加cpu的add rms_norm算子,c++和python接口 · 7d60e5b8
  zhuyue authored Dec 23, 2025
  
  7d60e5b8
19 Dec, 2025 1 commit
- issue/563 - 调整#include位置 · 812f6726
  pengcheng888 authored Dec 19, 2025
  
  812f6726
11 Dec, 2025 2 commits
- issue/744: kunlun softplus · f817c394
  zhangyue authored Dec 09, 2025
```
issue/744: softplus

issue/744: kunlun softplus

issue/744: delete F64
```
  f817c394
- issue/753: kunlun gelu kernel; delete template kernel Instantiation · 579bb1bf
  zhangyue authored Dec 11, 2025
  
  579bb1bf
10 Dec, 2025 2 commits

issue/746: 修复causal_softmax在长宽在1024边缘的计算错误 · 0b6bdab0
Ceng23333 authored Dec 10, 2025
```
Signed-off-by: Ceng23333 <441651826@qq.com>
```
0b6bdab0

Issue/739 在cpu, nvidia, metax, moore threads支持batched rope (#743) · 51beebc6

thatPepe authored Dec 10, 2025

* issue/739 - support batched RoPE on Nvidia and CPU

* issue/739 - metax, moore batched rope

* issue/739 - adjust metax flags

* issue/739 - added a rope module interface to forward inplace in output tensor

51beebc6

08 Dec, 2025 1 commit
- issue/722 - adjusted cuda rearrange for shape (8, 4, 20, 64) · 0eb27e6e
  wooway777 authored Dec 05, 2025
  
  0eb27e6e
04 Dec, 2025 2 commits
- Issue/705 - Refactor infinirt multi-device support. · 35e73b83
  zhuyue authored Dec 04, 2025
  
  35e73b83
- issue/704 - add ccl supprt for maca with mc api · 6433bf2b
  crapromer authored Dec 04, 2025
  
  6433bf2b
29 Nov, 2025 1 commit
- issue/563 Add metax support for topkrouter · a15aa367
  Zhao Shijie authored Nov 28, 2025
  
  a15aa367
28 Nov, 2025 4 commits
- issue/676: fix format · ce10d777
  zhangyue authored Nov 28, 2025
  
  ce10d777
- issue/676 format · 74aeb4f4
  zhangyue authored Nov 28, 2025
  
  74aeb4f4
- issue/676: format · c1af9783
  zhangyue authored Nov 28, 2025
  
  c1af9783
- issue/676: kunlun topkrouter · 5584035d
  zhangyue authored Nov 28, 2025
  
  5584035d
26 Nov, 2025 1 commit
- Issue/670 - fix: correct macro for mccub/hccub conditional compilation. · 904a9254
  zhuyue authored Nov 26, 2025
  
  904a9254
22 Nov, 2025 1 commit

Issue/658 - Add Moore platform support for add, mul, and silu operations · 33d0f769

zhuyue authored Nov 22, 2025

- Implement Moore backend for add, mul, and silu elementwise operations
- Filter unsupported dtypes (BF16, F64) for Moore platform in tests

33d0f769

21 Nov, 2025 5 commits
- Issue/654 - Update CUB API usage for CUDA 12.9+ compatibility · 10572e55
  zhuyue authored Nov 21, 2025
  
  10572e55
- Issue/654 - Update CUB API usage for CUDA 12.9+ compatibility · 9a9f0982
  zhuyue authored Nov 21, 2025
  
  9a9f0982
- Issue/648 - fix: fix metax compilation for tanh and layer_norm operations. · d18b77a0
  zhuyue authored Nov 21, 2025
  
  d18b77a0
- Issue/645 - Fix metax add rms_norm operators. · d93e352b
  zhuyue authored Nov 21, 2025
  
  d93e352b
- ISSUE/628 适配QY C610 GPU，增加编译选项，适配已有算子。添加bge类模型所需的算子， (#629) · 85bc98ac
  qinyiqun authored Nov 21, 2025
```
* ISSUE/628 适配QY C610 GPU，增加编译选项，适配已有算子。添加bge类模型所需的算子，包括gelu,layer_norm，lp_norm(支持l1，l2 norm)，relu，softmax，tanh。

---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
```
  85bc98ac
20 Nov, 2025 2 commits

Issue/636 - Fix metax compile error of fp8 with maca (#639) · 685b6e03

crapromer authored Nov 20, 2025

* issue/636 - add support for fp8 with maca sdk

* issue/636 - add functional header to support Fn

* issue/636 - format code with clang

685b6e03

Issue/445 沐曦平台添加macaSDK支持 (#468) · ed012302

crapromer authored Nov 20, 2025

* initial add mc support for meta

* add command description for maca compilation

* rebase metax maca support to main

* issue/445 - clang format code on ubuntu

* issue//445 - change config from use_mc to use-mc and format code

ed012302

19 Nov, 2025 1 commit
- Issue/626 - Add I32 and I64 dtype support to add operation (CPU) and tests. · 84d4ac48
  zhuyue authored Nov 19, 2025
  
  84d4ac48
07 Nov, 2025 1 commit

issue/367 - Fix compile bug on cuda 13.0 · c76c0645

crapromer authored Nov 07, 2025



* fix compile bug on cuda 13.0

* issue/367 - clang format code on ubuntu

---------
Co-authored-by: root <root@Crapromer>

c76c0645