Commits · 471309e238ba842a2a9c90e1a7f334c68d569faf · jerrrrry / infinilm

05 Mar, 2026 2 commits
- issue/248 add arg for flash-attn backend · ae210024
  PanZezhong authored Mar 03, 2026
  
  ae210024
- issue/248 support flash-attention lib · 7668db4f
  PanZezhong authored Mar 03, 2026
  
  7668db4f
11 Feb, 2026 1 commit

demo131 - multiple issues regarding quatization, qy, etc. · 71c70586

qinyiqun authored Feb 11, 2026



* issue/204 - support graph in server scripts

* issue/208 - adapt to ali ppu

* issue/194 - add quantization modify configs accordingly

支持nv w8 1batch 1tp

增加json支持

InfiniLM 增加量化层和global config

以一种比较优雅的方式增加了quant config的支持

修改部分代码结构，删除无用代码

跟随inifnicore修改

删除所有的model_config，统一使用global_config

跟随InfiniLM最新代码修改

修改函数参数顺序

改名global config 为model config

Refactor: add new API alongside legacy interfaces with deprecation warnings

添加w4 inifnicore相关内容，以及将Quantization config划入InfiniCore

添加w4 inifnicore相关内容，以及将Quantization config划入InfiniCore

* issue/175 - qy device support

qy_page_131: add qy device

success qy inference_server.py

* Issue/170 - Add HYGON support and improve device type handling.

* Issue/193: feats for deployment
Signed-off-by: Ceng23333 <441651826@qq.com>

* skip responding eos token
Signed-off-by: Ceng23333 <441651826@qq.com>

* issue/143 use add_rmsnorm, nt flash attn, nt kv caching

* issue/204 - support graph in server scripts

* issue/208 - adapt to ali ppu

* rebase main

* issue/216 feat: support static kv cache in server

* fix llm server cache config

* demo131 - resolve mishandled conflicts

* demo131 - further adjust attn and caching logic

* demo131 - resolve merge requirements

---------
Signed-off-by: Ceng23333 <441651826@qq.com>
Co-authored-by: wooway777 <wooway777@gmail.com>
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: gongchensu <zhuyue_134@qq.com>
Co-authored-by: Ceng23333 <441651826@qq.com>
Co-authored-by: PanZezhong <panzezhong@qiyuanlab.com>
Co-authored-by: MaYuhang <2902139028@qq.com>

71c70586

10 Feb, 2026 3 commits
- issue/143 fix add compile after model init · b5a809a0
  PanZezhong authored Jan 30, 2026
  
  b5a809a0
- issue/143 add barrier for compilers · 2c925eb4
  PanZezhong authored Jan 30, 2026
  
  2c925eb4
- issue/143 feat: static and paged graph compilers · 21274f33
  PanZezhong authored Jan 30, 2026
  
  21274f33
08 Jan, 2026 1 commit
- issue/168 support fixed paged attention api · 831e8a67
  PanZezhong authored Jan 08, 2026
  
  831e8a67
06 Jan, 2026 1 commit
- issue/168 remove input lengths · e48b5b0d
  PanZezhong authored Jan 06, 2026
  
  e48b5b0d
30 Dec, 2025 1 commit
- issue/168 InfiniLM接入paged attention · f246c4f1
  PanZezhong authored Dec 30, 2025
  
  f246c4f1
29 Dec, 2025 1 commit

issue/160: 梳理 InferEngine 相关接口 · 96e53dbb

Jiacheng Huang authored Dec 29, 2025

* 将 `cpp.LlamaForCausalLM` 提出，变为 `infinilm.infer_engine.InferEngine`

* 将 `Config` 构造逻辑拆分至 `AutoConfig` 中

* 在 `examples` 脚本中直接构造 `InferEngine`

* 将 `random_sample` 计算放入模型中

* 为 `InferEngine` 单独实现 `generate`

* 允许通过 `GenerationConfig` 传递 `temperature`、`top_k`、`top_p`

* 将 `random_sample` 处理从 `LlamaForCausalLM` 中转移到 `RankWorker` 里

* 在 `InferEngine.generate` 中直接 `append(output_id)`

* 修复 commit `13aa90c57de369f9985593c0066b6b06a7508b24` 引入的分布式卡死问题

* 将 `InferEngine.forward` 的接口与 C++ 层的 `InferEngine.Input` 对齐

* 提供了 `_measure_and_log_time` 参数来开启之前的 `generate` 内部计时功能

96e53dbb

26 Dec, 2025 1 commit
- issue/125 添加Paged KV Cache接口 · f147eb02
  PanZezhong authored Dec 26, 2025
  
  f147eb02
23 Dec, 2025 1 commit
- issue/125 统一Cache接口 · ff00b5c8
  PanZezhong authored Dec 23, 2025
  
  ff00b5c8
19 Dec, 2025 1 commit
- issue/135: 统一 `InferEngine::forward` 和 `Model::forward` 接口 · 5b5ff780
  Jiacheng Huang authored Dec 19, 2025
  
  5b5ff780
17 Dec, 2025 1 commit
- issue/134: 统一模型配置 · cdce626e
  Jiacheng Huang authored Dec 17, 2025
  
  cdce626e
11 Dec, 2025 1 commit
- Merge pull request #124 from InfiniTensor/issue/121 · 6498332e
  thatPepe authored Dec 11, 2025
```
Issue/121 - cache managements
```
  6498332e
09 Dec, 2025 1 commit
- issue/114 - 添加读取.bin文件权重的代码，更新readme · 300470cb
  pengcheng888 authored Dec 09, 2025
  
  300470cb
08 Dec, 2025 1 commit
- issue/106 适配模型9G7B · 9c256a17
  Ceng authored Dec 08, 2025
  
  9c256a17
07 Dec, 2025 1 commit
- issue/102 - 添加逐文件和逐tensor从model.savetensor读取权重的函数 · 7128a9a5
  pengcheng888 authored Dec 07, 2025
  
  7128a9a5
06 Dec, 2025 2 commits
- issue/92 将run改为异步 · 9c4020a4
  PanZezhong authored Dec 06, 2025
  
  9c4020a4
- issue/92 添加InferEngine，支持多线程推理 · 3d328d61
  PanZezhong1725 authored Dec 06, 2025
  
  3d328d61