Commits · dce99862d09ba3bfcb05b7061d48b59b1f620a50 · jerrrrry / infinicore

05 Mar, 2026 4 commits
- issue/1033 - replace __C with __INFINI_C · b1ee0a8a
  wooway777 authored Mar 05, 2026
  
  b1ee0a8a
- issue/1033 add flash-attn compile target · 06362c94
  PanZezhong authored Mar 05, 2026
  
  06362c94
- issue/1033 support stream guard · 515e1eca
  PanZezhong authored Mar 05, 2026
  
  515e1eca
- issue/1033 support flash_attn lib with aten adaptor · f6496d44
  PanZezhong authored Mar 05, 2026
  
  f6496d44
11 Feb, 2026 2 commits

issue/949 - feat: add silu_and_mul for moore gpu with test pass · 54635d9f
zhushuang authored Jan 22, 2026

54635d9f

qinyiqun authored Feb 11, 2026



demo131 - multiple issues regarding quantization, qy, and so forth

* issue/843: success per_channel_quant_int8

* issue/843: success qy quant

* issue/843: modified quant

* Add w8a8int8 performance tests

* add infinicore op linear_w8a8i8

* w8a8 linear module functional nn

* issue/843: QY-GPU Support Int8 scale_mm (#68)

* issue/843: success qy scaled_mm

* issue/843: modified kernel.cuh as per_channel_dequant_int8.cuh

* fix parallel slic in w8

* w8: support multiple batch size

* temp: 修改quantconfig处理

* fix format and delete redundancy code

* fix format

* fix format

* fix format

* Refactor: add new API alongside legacy interfaces with deprecation warnings

* 添加w4 inifnicore相关内容，以及将Quantization config划入InfiniCore

* 量化算子支持图

* solve cub version problem and fix code structure

* fix format

* demo131 - remove commented lines

---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
Co-authored-by: wooway777 <wooway777@gmail.com>

eb89439d

04 Feb, 2026 1 commit
- issue/988 - adapt to ali ppu · 7e2a4c08
  wooway777 authored Jan 27, 2026
  
  7e2a4c08
27 Jan, 2026 6 commits
- issue/923 - ninetoothed kv caching for nv, il, mtx · 97eced0e
  wooway777 authored Jan 26, 2026
  
  97eced0e
- issue/919 - ninetoothed flash attention · 6ac8f906
  wooway777 authored Jan 26, 2026
  
  6ac8f906
- issue/810 support more ops as graph op · 81e5fe94
  PanZezhong authored Jan 19, 2026
  
  81e5fe94
- issue/791 fix add_rmsnorm api and rmsnorm module · 0c204dfd
  PanZezhong authored Jan 23, 2026
  
  0c204dfd
- issue/900 - adapt to graph and adjust test script · eb34d4d6
  wooway777 authored Jan 09, 2026
  
  eb34d4d6
- issue/846 - Refactor embedding to support device-side input and CUDA graph recording · cc2cc3a1
  gongchensu authored Dec 26, 2025
```
- Ensure embedding tensors are on the same device. Change format.
- Optimize embedding kernel with vectorized memory access and __ldg
- Add vectorized memory access using float4/float2, half2, and bfloat162
- Use __ldg instruction for read-only weight and indices access
- Add memory alignment checks to enable vectorized paths
- Add __restrict__ keywords for better compiler optimization
- Implement dynamic block size selection based on embedding_dim
```
  cc2cc3a1
22 Jan, 2026 1 commit
- issue/811 fix tensor to blob and resume · 90cc3bdd
  PanZezhong authored Jan 22, 2026
  
  90cc3bdd
21 Jan, 2026 1 commit
- issue/811 support cuda graph capture · 91f91f7c
  PanZezhong authored Jan 21, 2026
  
  91f91f7c
19 Jan, 2026 1 commit
- issue/810 feat: allow graph tensor to resume to allocator's tracking · c1535ae8
  PanZezhong authored Jan 19, 2026
  
  c1535ae8
14 Jan, 2026 1 commit
- issue/920 RoPE supports longrope · 06dcc067
  PanZezhong authored Jan 14, 2026
  
  06dcc067
12 Jan, 2026 1 commit
- issue/867 fix page caching api, paged attn support more head dims · 96551cb7
  PanZezhong authored Jan 12, 2026
  
  96551cb7
09 Jan, 2026 2 commits
- issue/867 pass total kv lens as paged attn args · 499b1dc6
  PanZezhong authored Jan 09, 2026
  
  499b1dc6
- issue/810 add common graph op macros · 0fa8805e
  PanZezhong authored Jan 09, 2026
  
  0fa8805e
08 Jan, 2026 1 commit
- issue/867 - feat: adjust paged_attention_prefill interface naming · 0a2839a2
  zhushuang authored Jan 07, 2026
  
  0a2839a2
06 Jan, 2026 1 commit
- issue/810 static compute graph infra · 006d530c
  PanZezhong authored Jan 06, 2026
  
  006d530c
30 Dec, 2025 3 commits
- issue/847 paged attention prefill一段式接口 · 99b940b2
  PanZezhong authored Dec 30, 2025
  
  99b940b2
- issue/847 correct cache_lens naming · e13ad8f9
  PanZezhong authored Dec 30, 2025
  
  e13ad8f9
- issue/848 - feat: add paged attention prefill for nvidia gpu with test pass · 1ba0bcfa
  zhushuang authored Dec 30, 2025
  
  1ba0bcfa
29 Dec, 2025 2 commits
- issue/847-paged caching和atention添加infinicore的接口和测试 · 38078981
  pengcheng888 authored Dec 29, 2025
  
  38078981
- issue/834 - feat: add paged attention for nvidia gpu with test pass · 17299923
  zhushuang authored Dec 29, 2025
  
  17299923
26 Dec, 2025 3 commits
- 在 `include/infinicore/ops.hpp` 中加入 `#include "ops/random_sample.hpp"` · ca6e759f
  Jiacheng Huang authored Dec 26, 2025
  
  ca6e759f
- Issue/840: 英伟达支持Int8 Gemm (#852) · ed04d3e6
  qinyiqun authored Dec 26, 2025
```
* can commit

* can exec sm_90a

* can exec < sm_90

* fix format

* fix format

* 增加测试，测试对标sglang test

* fix format 1

* fix format 2

* add compile option to disable cutlass
```
  ed04d3e6
- Revert "Issue/840: 英伟达Int8 Gemm (#841)" · 458bb997
  PanZezhong1725 authored Dec 26, 2025
```
This reverts commit 25258029.
```
  458bb997
25 Dec, 2025 1 commit
- Issue/840: 英伟达Int8 Gemm (#841) · 25258029
  qinyiqun authored Dec 25, 2025
  
  25258029
24 Dec, 2025 2 commits
- Unify add_rms_norm to always return (normalized_result, add_result) pair. · 2a432b34
  zhuyue authored Dec 24, 2025
  
  2a432b34
- 增加cpu的add rms_norm算子,c++和python接口 · 7d60e5b8
  zhuyue authored Dec 23, 2025
  
  7d60e5b8
22 Dec, 2025 1 commit
- issue/821 添加squeeze算子，完善unsqueeze算子测试 · 8e25feb0
  PanZezhong authored Dec 22, 2025
  
  8e25feb0
18 Dec, 2025 1 commit
- issue/798 - fix operator device handling · 3720127c
  wooway777 authored Dec 17, 2025
  
  3720127c
10 Dec, 2025 1 commit

Issue/739 在cpu, nvidia, metax, moore threads支持batched rope (#743) · 51beebc6

thatPepe authored Dec 10, 2025

* issue/739 - support batched RoPE on Nvidia and CPU

* issue/739 - metax, moore batched rope

* issue/739 - adjust metax flags

* issue/739 - added a rope module interface to forward inplace in output tensor

51beebc6

09 Dec, 2025 1 commit
- issue/741 暴露 infinicore_cpp_api 类继承 · c74dfaea
  PanZezhong authored Dec 09, 2025
  
  c74dfaea
08 Dec, 2025 1 commit
- issue/713 - 为c++添加 RowParallelLinear 和 ColParallelLinear · 92472152
  pengcheng888 authored Dec 08, 2025
  
  92472152
06 Dec, 2025 1 commit
- issue/719 add more logs · 11aa0c14
  PanZezhong1725 authored Dec 06, 2025
  
  11aa0c14
05 Dec, 2025 1 commit
- Issue/714 - feat(random_sample): add batch processing interface. · ff84910c
  zhuyue authored Dec 04, 2025
  
  ff84910c