Commits · d6af9c90f81a949ff5dc72bc19bdfd58a789ce83 · jerrrrry / infinicore

09 Mar, 2026 1 commit
- issue/1031 T1-1-17 · d6af9c90
  PanZezhong1725 authored Mar 09, 2026
  
  d6af9c90
03 Mar, 2026 1 commit
- issue/1035: kv caching on nvidia · c70805c9
  xgqdut2016 authored Mar 02, 2026
  
  c70805c9
11 Feb, 2026 2 commits

issue/949 - feat: add silu_and_mul for moore gpu with test pass · 54635d9f
zhushuang authored Jan 22, 2026

54635d9f

qinyiqun authored Feb 11, 2026



demo131 - multiple issues regarding quantization, qy, and so forth

* issue/843: success per_channel_quant_int8

* issue/843: success qy quant

* issue/843: modified quant

* Add w8a8int8 performance tests

* add infinicore op linear_w8a8i8

* w8a8 linear module functional nn

* issue/843: QY-GPU Support Int8 scale_mm (#68)

* issue/843: success qy scaled_mm

* issue/843: modified kernel.cuh as per_channel_dequant_int8.cuh

* fix parallel slic in w8

* w8: support multiple batch size

* temp: 修改quantconfig处理

* fix format and delete redundancy code

* fix format

* fix format

* fix format

* Refactor: add new API alongside legacy interfaces with deprecation warnings

* 添加w4 inifnicore相关内容，以及将Quantization config划入InfiniCore

* 量化算子支持图

* solve cub version problem and fix code structure

* fix format

* demo131 - remove commented lines

---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
Co-authored-by: wooway777 <wooway777@gmail.com>

eb89439d

04 Feb, 2026 1 commit
- issue/988 - adapt to ali ppu · 7e2a4c08
  wooway777 authored Jan 27, 2026
  
  7e2a4c08
27 Jan, 2026 1 commit
- issue/791 fix add_rmsnorm api and rmsnorm module · 0c204dfd
  PanZezhong authored Jan 23, 2026
  
  0c204dfd
12 Jan, 2026 1 commit
- issue/867 fix page caching api, paged attn support more head dims · 96551cb7
  PanZezhong authored Jan 12, 2026
  
  96551cb7
08 Jan, 2026 1 commit
- issue/867 - feat: adjust paged_attention_prefill interface naming · 0a2839a2
  zhushuang authored Jan 07, 2026
  
  0a2839a2
30 Dec, 2025 1 commit
- issue/848 - feat: add paged attention prefill for nvidia gpu with test pass · 1ba0bcfa
  zhushuang authored Dec 30, 2025
  
  1ba0bcfa
29 Dec, 2025 1 commit
- issue/834 - feat: add paged attention for nvidia gpu with test pass · 17299923
  zhushuang authored Dec 29, 2025
  
  17299923
26 Dec, 2025 2 commits

Issue/840: 英伟达支持Int8 Gemm (#852) · ed04d3e6

qinyiqun authored Dec 26, 2025

* can commit

* can exec sm_90a

* can exec < sm_90

* fix format

* fix format

* 增加测试，测试对标sglang test

* fix format 1

* fix format 2

* add compile option to disable cutlass

ed04d3e6

Revert "Issue/840: 英伟达Int8 Gemm (#841)" · 458bb997
PanZezhong1725 authored Dec 26, 2025
```
This reverts commit 25258029.
```
458bb997

25 Dec, 2025 1 commit
- Issue/840: 英伟达Int8 Gemm (#841) · 25258029
  qinyiqun authored Dec 25, 2025
  
  25258029
24 Dec, 2025 1 commit
- 增加cpu的add rms_norm算子,c++和python接口 · 7d60e5b8
  zhuyue authored Dec 23, 2025
  
  7d60e5b8
22 Dec, 2025 1 commit
- issue/821 添加squeeze算子，完善unsqueeze算子测试 · 8e25feb0
  PanZezhong authored Dec 22, 2025
  
  8e25feb0
17 Dec, 2025 1 commit
- issue/792 - add moore threads device sync to tests · 0017fa0b
  wooway777 authored Dec 17, 2025
  
  0017fa0b
22 Nov, 2025 2 commits
- Issue/658 - Update test tolerances and remove device-specific dtype filters. · e00e65e2
  zhuyue authored Nov 22, 2025
  
  e00e65e2
- Issue/658 - Add Moore platform support for add, mul, and silu operations · 33d0f769
  zhuyue authored Nov 22, 2025
```
- Implement Moore backend for add, mul, and silu elementwise operations
- Filter unsupported dtypes (BF16, F64) for Moore platform in tests
```
  33d0f769
21 Nov, 2025 1 commit

ISSUE/628 适配QY C610 GPU，增加编译选项，适配已有算子。添加bge类模型所需的算子， (#629) · 85bc98ac

qinyiqun authored Nov 21, 2025



* ISSUE/628 适配QY C610 GPU，增加编译选项，适配已有算子。添加bge类模型所需的算子，包括gelu,layer_norm，lp_norm(支持l1，l2 norm)，relu，softmax，tanh。

---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>

85bc98ac

19 Nov, 2025 1 commit
- Issue/626 - Add I32 and I64 dtype support to add operation (CPU) and tests. · 84d4ac48
  zhuyue authored Nov 19, 2025
  
  84d4ac48
28 Oct, 2025 1 commit
- issue/456/feat: add silu operator · e184c7e4
  tianyuxbear authored Jul 25, 2025
  
  e184c7e4
23 Oct, 2025 1 commit
- issue/473 - the ones and zeros operators · 9b8de584
  pengcheng888 authored Oct 23, 2025
```
Co-authored-by: pengcheng888 <pengcheng@example.com>
```
  9b8de584
16 Oct, 2025 1 commit

issue/383: Add logsoftmax ops · 05a2e149

gongchensu authored Oct 16, 2025


Co-authored-by: wawahejun <hejunlbbc@gmail.com>
Co-authored-by: zhuyue <zhuyue@qiyuanlab.com>

05a2e149

29 Sep, 2025 2 commits
- issue/427 - the sigmoid, topksoftmax, and topkrouter ops · ed530e11
  pengcheng888 authored Sep 29, 2025
  
  ed530e11
- issue/486 Adapt seven operators to Hygon machines. · e698ef6b
  gongchensu authored Sep 29, 2025
```
Co-authored-by: zhuyue <zhuyue@qiyuanlab.com>
```
  e698ef6b
23 Sep, 2025 1 commit
- feat: rename Dequantize to DequantizeAWQ in nvidia gpu · 4217976d
  zhushuang authored Sep 23, 2025
  
  4217976d
18 Sep, 2025 1 commit
- issue/458 add AWQ dequantization torch test and improve variable naming readability · 82b2a84c
  spike-zhu authored Sep 18, 2025
  
  82b2a84c
17 Sep, 2025 1 commit
- issue/436: 支持9g7b 4b模型 · 3bdd832e
  zhangyue authored Sep 17, 2025
  
  3bdd832e
16 Sep, 2025 1 commit
- issue/428: merge rope_v2 into rope with algorithm selection · 86515765
  Ziminli authored Sep 07, 2025
  
  86515765
10 Sep, 2025 1 commit
- issue/440 feat: add softplus operator · 1635fd92
  PanZezhong1725 authored Sep 10, 2025
  
  1635fd92
02 Sep, 2025 1 commit
- [T2-2-3] blkmjsian · 9ad23fad
  blkmjsian authored Sep 02, 2025
```
- dequantize awq
- rope v2
```
  9ad23fad
14 Aug, 2025 1 commit
- issue/365 - fixing an interface mismatch · 1d1e0649
  thatPepe authored Aug 14, 2025
  
  1d1e0649
13 Aug, 2025 1 commit
- issue/214 - Elementwise Support for Cambricon Bang · adbda4c4
  thatPepe authored Aug 13, 2025
  
  adbda4c4
14 Jul, 2025 1 commit
- issue/158/refactor: 支持天数的 pytorch 测试 · d9de5133
  YdrMaster authored Jul 14, 2025
```
Signed-off-by: YdrMaster <ydrml@hotmail.com>
```
  d9de5133
11 Jul, 2025 1 commit
- Issue/213 添加conv算子cpu/cuda实现 · d417f967
  zhiwu zhou authored Jul 11, 2025
  
  d417f967
09 Jul, 2025 1 commit
- Add a CPU implementation of ReLU · 4ac6e71b
  Jiacheng Huang authored May 29, 2025
  
  4ac6e71b
07 Jul, 2025 1 commit
- issue/307 unify test tensor creation in pytorch tests · f62e952e
  PanZezhong authored Jul 07, 2025
  
  f62e952e
04 Jul, 2025 1 commit
- issue/304 fix clip · 6e7fe25f
  PanZezhong authored Jul 04, 2025
  
  6e7fe25f
01 Jul, 2025 1 commit

issue/254: 添加算子在CPU和CUDA上对BF16的支持，并增加相应的测试代码 (#255) · f88d4ad8

蒋帅宏（Shuaihong_Jiang） authored Jul 01, 2025



* issue/254: 添加算子在CPU和CUDA上对BF16的支持，并增加相应的测试代码

* issue/254: 将修改后的算子格式化后重新提交

* 修改与最新main的冲突

* 解决冲突后rms_norm原本的精度过不了了，现在由
{"atol": 5e-3, "rtol": 5e-3}更改为
{"atol": 8e-3, "rtol": 8e-3}

* rms_norm在debug模式下FP16的测试用例失败了（本地测试能通过，github上过不了），
所以将容差增大了两倍进行测试

* 将rms_normd的测试输入缩放0.5，将容差改回原始值来进行ci测试

* issue/254: 1.使用CHECK_DTYPE宏来进行数据类型检验
2.在test的utils.py中添加了设备对BF16支持的检验

* issue/254: rms_norm测试fp16容差由
torch.float16: {"atol": 1e-3, "rtol": 1e-3},
改为torch.float16: {"atol": 2e-3, "rtol": 2e-3},
并删除对输入0.5的放缩

* issue/254: 在utils.py中debug方法和debug_all方法中
添加了对BF16的特判

* 修改支持BF16测试的设备类型检查方法

* 修改支持BF16测试的设备检查

* issue/254: reduce redundancy in rms_norm.py

* issue/254: add back the missing comment in rms_norm.py

* issue/254: add fp32 tolerance condition in causal_softmax.py

---------
Co-authored-by: Zimin Li <coollizimin@gmail.com>

f88d4ad8

20 Jun, 2025 1 commit
- issue/273: fully support equal_nan option for debug() and debug_all() · 818db4ae
  Zimin Li authored Jun 20, 2025
  
  818db4ae