Commits · d3e27d8cbefe7b714756e0df72446ab363371163 · jerrrrry / infinicore

11 Feb, 2026 6 commits

demo131 - remove fp32 from paged tests · d3e27d8c
wooway777 authored Feb 10, 2026

d3e27d8c
Merge pull request #1010 from InfiniTensor/issue/899 · 513a8502
thatPepe authored Feb 11, 2026
```
issue/899 - fix: fix causal_softmax and rearrange bug 
```
513a8502
Merge pull request #1009 from InfiniTensor/issue/949 · c312f175
thatPepe authored Feb 11, 2026
```
issue/949 - feat: add silu_and_mul for moore gpu with test pass
```
c312f175
issue/899 - fix: fix causal_softmax and rearrange bug · e4bce369
zhushuang authored Jan 13, 2026

e4bce369
issue/949 - feat: add silu_and_mul for moore gpu with test pass · 54635d9f
zhushuang authored Jan 22, 2026

54635d9f

qinyiqun authored Feb 11, 2026



demo131 - multiple issues regarding quantization, qy, and so forth

* issue/843: success per_channel_quant_int8

* issue/843: success qy quant

* issue/843: modified quant

* Add w8a8int8 performance tests

* add infinicore op linear_w8a8i8

* w8a8 linear module functional nn

* issue/843: QY-GPU Support Int8 scale_mm (#68)

* issue/843: success qy scaled_mm

* issue/843: modified kernel.cuh as per_channel_dequant_int8.cuh

* fix parallel slic in w8

* w8: support multiple batch size

* temp: 修改quantconfig处理

* fix format and delete redundancy code

* fix format

* fix format

* fix format

* Refactor: add new API alongside legacy interfaces with deprecation warnings

* 添加w4 inifnicore相关内容，以及将Quantization config划入InfiniCore

* 量化算子支持图

* solve cub version problem and fix code structure

* fix format

* demo131 - remove commented lines

---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
Co-authored-by: wooway777 <wooway777@gmail.com>

eb89439d

04 Feb, 2026 4 commits
- Merge pull request #999 from InfiniTensor/issue/988 · abab5652
  thatPepe authored Feb 04, 2026
```
issue/988 - adapt to ali ppu
```
  abab5652
- issue/988 - update readme · e0268b24
  wooway777 authored Feb 04, 2026
  
  e0268b24
- issue/988 - unlock unused operators on ali ppu · 5558e856
  wooway777 authored Feb 04, 2026
  
  5558e856
- issue/988 - adapt to ali ppu · 7e2a4c08
  wooway777 authored Jan 27, 2026
  
  7e2a4c08
29 Jan, 2026 1 commit
- issue/995 fix paged attn on iluvatar · bf0c825d
  zhangyue authored Jan 29, 2026
  
  bf0c825d
27 Jan, 2026 25 commits
- Merge pull request #989 from InfiniTensor/issue/811-fix · 70862bcc
  PanZezhong1725 authored Jan 27, 2026
```
issue/811 use relax graph capture mode
```
  70862bcc
- issue/811 use relax graph capture mode, add compile flag for graph instantiate · 807e5e43
  PanZezhong authored Jan 27, 2026
  
  807e5e43
- demo131 - patch lua flags and includes · 1fa56298
  wooway777 authored Jan 26, 2026
  
  1fa56298
- issue/983 - adapted the optimized paged attention to metax · 7a18d241
  wooway777 authored Jan 26, 2026
  
  7a18d241
- issue/979 - removed commented paged attn codes · 4cd1f688
  wooway777 authored Jan 26, 2026
  
  4cd1f688
- issue/979 optimize paged attention · 1c18c046
  PanZezhong authored Jan 23, 2026
  
  1c18c046
- issue/923 - ninetoothed kv caching for nv, il, mtx · 97eced0e
  wooway777 authored Jan 26, 2026
  
  97eced0e
- issue/931 - ninetoothed swiglu for nv, il, mtx · 5614e1be
  wooway777 authored Jan 26, 2026
  
  5614e1be
- issue/919 - ninetoothed flash attention · 6ac8f906
  wooway777 authored Jan 26, 2026
  
  6ac8f906
- issue/935 - add metax include dir for ninetoothed · 47843aa6
  wooway777 authored Jan 15, 2026
  
  47843aa6
- issue/940 - check build result and implicitly require build.py for build ntops · ca58118f
  wooway777 authored Jan 26, 2026
  
  ca58118f
- issue/925 - Speed up `scripts/build_ntops.py` and... · 32340fc3
  Jiacheng Huang authored Jan 14, 2026
```
issue/925 - Speed up `scripts/build_ntops.py` and `src/infiniop/ninetoothed/build.py` with `concurrent.futures`
```
  32340fc3
- issue/402 - convenient ninetoothed util · 55cd22e3
  Jiacheng Huang authored Aug 25, 2025
```
对 `NineToothedTensor` 进行 C++ 层封装

加入使用数组作为 `shape` 和 `strides` 创建 `ninetoothed::Tensor` 的方式

使用 `ninetoothed::Tensor` 接入九齿的 ReLU 算子

Add an include guard to `ninetoothed/utils.h`
```
  55cd22e3
- issue/985 - adjust cxflags and cxxflags for lua scripts · 7c5aa160
  wooway777 authored Jan 26, 2026
  
  7c5aa160
- issue/810 support more ops as graph op · 81e5fe94
  PanZezhong authored Jan 19, 2026
  
  81e5fe94
- issue/791 - fix add_rmsnorm api on mtx and mth · 0611cb1b
  wooway777 authored Jan 26, 2026
  
  0611cb1b
- issue/632 - adapt to iluvatar core 20 · 4ddc6647
  wooway777 authored Jan 19, 2026
  
  4ddc6647
- issue/884 - add_rms_norm on iluvatar, metax and moore · dfafc21f
  wooway777 authored Jan 07, 2026
  
  dfafc21f
- issue/791 fix add_rmsnorm api and rmsnorm module · 0c204dfd
  PanZezhong authored Jan 23, 2026
  
  0c204dfd
- issue/900 - maintains classic embedding for devices yet to be worked on · f9761a29
  wooway777 authored Jan 19, 2026
  
  f9761a29
- issue/900 - adapt to graph and adjust test script · eb34d4d6
  wooway777 authored Jan 09, 2026
  
  eb34d4d6
- issue/900 - support embedding on iluvatar, metax, and moore · 835209e7
  wooway777 authored Jan 08, 2026
  
  835209e7
- issue/846 - Refactor embedding to support device-side input and CUDA graph recording · cc2cc3a1
  gongchensu authored Dec 26, 2025
```
- Ensure embedding tensors are on the same device. Change format.
- Optimize embedding kernel with vectorized memory access and __ldg
- Add vectorized memory access using float4/float2, half2, and bfloat162
- Use __ldg instruction for read-only weight and indices access
- Add memory alignment checks to enable vectorized paths
- Add __restrict__ keywords for better compiler optimization
- Implement dynamic block size selection based on embedding_dim
```
  cc2cc3a1
- issue/978 - metax cuda graph impl and wrappings · 822a5341
  wooway777 authored Jan 23, 2026
  
  822a5341
- issue/987 - add .cpp files to ninetoothed includes · 1e637102
  wooway777 authored Jan 27, 2026
  
  1e637102
22 Jan, 2026 3 commits
- Merge pull request #967 from InfiniTensor/issue/811 · 3c8fb3c0
  PanZezhong1725 authored Jan 22, 2026
```
issue/811 fix tensor to blob and resume
```
  3c8fb3c0
- issue/811 fix tensor to blob and resume · 90cc3bdd
  PanZezhong authored Jan 22, 2026
  
  90cc3bdd
- Merge pull request #955 from InfiniTensor/issue/811 · f00c06d0
  PanZezhong1725 authored Jan 22, 2026
```
issue/811 support cuda graph capture
```
  f00c06d0
21 Jan, 2026 1 commit
- issue/811 add warmups before cuda graph capture · 3a8c6860
  PanZezhong authored Jan 21, 2026
  
  3a8c6860