Commits · dce99862d09ba3bfcb05b7061d48b59b1f620a50 · jerrrrry / infinicore

05 Mar, 2026 1 commit
- issue/1033 - replace __C with __INFINI_C · b1ee0a8a
  wooway777 authored Mar 05, 2026
  
  b1ee0a8a
11 Feb, 2026 2 commits

issue/949 - feat: add silu_and_mul for moore gpu with test pass · 54635d9f
zhushuang authored Jan 22, 2026

54635d9f

qinyiqun authored Feb 11, 2026



demo131 - multiple issues regarding quantization, qy, and so forth

* issue/843: success per_channel_quant_int8

* issue/843: success qy quant

* issue/843: modified quant

* Add w8a8int8 performance tests

* add infinicore op linear_w8a8i8

* w8a8 linear module functional nn

* issue/843: QY-GPU Support Int8 scale_mm (#68)

* issue/843: success qy scaled_mm

* issue/843: modified kernel.cuh as per_channel_dequant_int8.cuh

* fix parallel slic in w8

* w8: support multiple batch size

* temp: 修改quantconfig处理

* fix format and delete redundancy code

* fix format

* fix format

* fix format

* Refactor: add new API alongside legacy interfaces with deprecation warnings

* 添加w4 inifnicore相关内容，以及将Quantization config划入InfiniCore

* 量化算子支持图

* solve cub version problem and fix code structure

* fix format

* demo131 - remove commented lines

---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
Co-authored-by: wooway777 <wooway777@gmail.com>

eb89439d

27 Jan, 2026 4 commits

issue/923 - ninetoothed kv caching for nv, il, mtx · 97eced0e
wooway777 authored Jan 26, 2026

97eced0e
issue/919 - ninetoothed flash attention · 6ac8f906
wooway777 authored Jan 26, 2026

6ac8f906
issue/791 fix add_rmsnorm api and rmsnorm module · 0c204dfd
PanZezhong authored Jan 23, 2026

0c204dfd

issue/846 - Refactor embedding to support device-side input and CUDA graph recording · cc2cc3a1

gongchensu authored Dec 26, 2025

- Ensure embedding tensors are on the same device. Change format.
- Optimize embedding kernel with vectorized memory access and __ldg
- Add vectorized memory access using float4/float2, half2, and bfloat162
- Use __ldg instruction for read-only weight and indices access
- Add memory alignment checks to enable vectorized paths
- Add __restrict__ keywords for better compiler optimization
- Implement dynamic block size selection based on embedding_dim

cc2cc3a1

12 Jan, 2026 1 commit
- issue/867 fix page caching api, paged attn support more head dims · 96551cb7
  PanZezhong authored Jan 12, 2026
  
  96551cb7
09 Jan, 2026 1 commit
- issue/867 pass total kv lens as paged attn args · 499b1dc6
  PanZezhong authored Jan 09, 2026
  
  499b1dc6
08 Jan, 2026 1 commit
- issue/867 - feat: adjust paged_attention_prefill interface naming · 0a2839a2
  zhushuang authored Jan 07, 2026
  
  0a2839a2
30 Dec, 2025 1 commit
- issue/848 - feat: add paged attention prefill for nvidia gpu with test pass · 1ba0bcfa
  zhushuang authored Dec 30, 2025
  
  1ba0bcfa
29 Dec, 2025 1 commit
- issue/834 - feat: add paged attention for nvidia gpu with test pass · 17299923
  zhushuang authored Dec 29, 2025
  
  17299923
26 Dec, 2025 2 commits

Issue/840: 英伟达支持Int8 Gemm (#852) · ed04d3e6

qinyiqun authored Dec 26, 2025

* can commit

* can exec sm_90a

* can exec < sm_90

* fix format

* fix format

* 增加测试，测试对标sglang test

* fix format 1

* fix format 2

* add compile option to disable cutlass

ed04d3e6

Revert "Issue/840: 英伟达Int8 Gemm (#841)" · 458bb997
PanZezhong1725 authored Dec 26, 2025
```
This reverts commit 25258029.
```
458bb997

25 Dec, 2025 1 commit
- Issue/840: 英伟达Int8 Gemm (#841) · 25258029
  qinyiqun authored Dec 25, 2025
  
  25258029
24 Dec, 2025 2 commits
- Unify add_rms_norm to always return (normalized_result, add_result) pair. · 2a432b34
  zhuyue authored Dec 24, 2025
  
  2a432b34
- 增加cpu的add rms_norm算子,c++和python接口 · 7d60e5b8
  zhuyue authored Dec 23, 2025
  
  7d60e5b8
05 Dec, 2025 1 commit
- Issue/714 - feat(random_sample): add batch processing interface. · ff84910c
  zhuyue authored Dec 04, 2025
  
  ff84910c
21 Nov, 2025 1 commit

ISSUE/628 适配QY C610 GPU，增加编译选项，适配已有算子。添加bge类模型所需的算子， (#629) · 85bc98ac

qinyiqun authored Nov 21, 2025



* ISSUE/628 适配QY C610 GPU，增加编译选项，适配已有算子。添加bge类模型所需的算子，包括gelu,layer_norm，lp_norm(支持l1，l2 norm)，relu，softmax，tanh。

---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>

85bc98ac

28 Oct, 2025 1 commit
- issue/456/feat: add silu operator · e184c7e4
  tianyuxbear authored Jul 25, 2025
  
  e184c7e4
23 Oct, 2025 1 commit
- issue/473 - the ones and zeros operators · 9b8de584
  pengcheng888 authored Oct 23, 2025
```
Co-authored-by: pengcheng888 <pengcheng@example.com>
```
  9b8de584
16 Oct, 2025 1 commit

issue/383: Add logsoftmax ops · 05a2e149

gongchensu authored Oct 16, 2025


Co-authored-by: wawahejun <hejunlbbc@gmail.com>
Co-authored-by: zhuyue <zhuyue@qiyuanlab.com>

05a2e149

29 Sep, 2025 1 commit
- issue/427 - the sigmoid, topksoftmax, and topkrouter ops · ed530e11
  pengcheng888 authored Sep 29, 2025
  
  ed530e11
23 Sep, 2025 1 commit
- feat: rename Dequantize to DequantizeAWQ in nvidia gpu · 4217976d
  zhushuang authored Sep 23, 2025
  
  4217976d
18 Sep, 2025 1 commit
- issue/458 add AWQ dequantization torch test and improve variable naming readability · 82b2a84c
  spike-zhu authored Sep 18, 2025
  
  82b2a84c
16 Sep, 2025 1 commit
- issue/428: merge rope_v2 into rope with algorithm selection · 86515765
  Ziminli authored Sep 07, 2025
  
  86515765
10 Sep, 2025 1 commit
- issue/440 feat: add softplus operator · 1635fd92
  PanZezhong1725 authored Sep 10, 2025
  
  1635fd92
02 Sep, 2025 1 commit
- [T2-2-3] blkmjsian · 9ad23fad
  blkmjsian authored Sep 02, 2025
```
- dequantize awq
- rope v2
```
  9ad23fad
11 Jul, 2025 1 commit
- Issue/213 添加conv算子cpu/cuda实现 · d417f967
  zhiwu zhou authored Jul 11, 2025
  
  d417f967
09 Jul, 2025 1 commit
- Add a CPU implementation of ReLU · 4ac6e71b
  Jiacheng Huang authored May 29, 2025
  
  4ac6e71b
07 Jul, 2025 1 commit
- issue/307 unify test tensor creation in pytorch tests · f62e952e
  PanZezhong authored Jul 07, 2025
  
  f62e952e
27 Jun, 2025 2 commits
- issue/205 - 添加Sub算子的gguf测试用例 · 37332d40
  Pepe authored Apr 28, 2025
  
  37332d40
- issue/205 - 添加Sub算子 · 2ccf1d9d
  Pepe authored Apr 27, 2025
```
issue/205 - 添加Sub算子的头文件、CPU实现、cuda实现、及Python测试
```
  2ccf1d9d
06 May, 2025 1 commit
- issue/204: add算子测例 · 16506fc0
  Catheriany authored May 06, 2025
  
  16506fc0
29 Apr, 2025 1 commit
- issue/180：完全删除多与代码且整理格式 · cfaa6af8
  goldenfox2025 authored Apr 30, 2025
  
  cfaa6af8
28 Apr, 2025 1 commit
- issue/180：添加clip算子 · 8a49900f
  goldenfox2025 authored Apr 28, 2025
  
  8a49900f
25 Apr, 2025 2 commits
- issue/183 根据反馈修改 · d4c0cdf9
  Graylatzhou authored Apr 24, 2025
  
  d4c0cdf9
- Issue/183 Mul算子CPU&CUDA · 975559ee
  Graylatzhou authored Apr 22, 2025
  
  975559ee
22 Apr, 2025 1 commit
- issue/172 Add 算子 CPU & CUDA · a5716a8c
  PanZezhong authored Apr 22, 2025
  
  a5716a8c
14 Apr, 2025 1 commit
- issue/127: Refactor ElementwiseInfo, refactor elementwise to use workspace for... · 9cc0c416
  Zimin Li authored Apr 14, 2025
```
issue/127: Refactor ElementwiseInfo, refactor elementwise to use workspace for storing meta, fix misc. issues
```
  9cc0c416