- 05 Mar, 2026 2 commits
-
-
PanZezhong authored
-
PanZezhong authored
-
- 11 Feb, 2026 1 commit
-
-
qinyiqun authored
* issue/204 - support graph in server scripts * issue/208 - adapt to ali ppu * issue/194 - add quantization modify configs accordingly 支持nv w8 1batch 1tp 增加json支持 InfiniLM 增加量化层和global config 以一种比较优雅的方式增加了quant config的支持 修改部分代码结构,删除无用代码 跟随inifnicore修改 删除所有的model_config,统一使用global_config 跟随InfiniLM最新代码修改 修改函数参数顺序 改名global config 为model config Refactor: add new API alongside legacy interfaces with deprecation warnings 添加w4 inifnicore相关内容,以及将Quantization config划入InfiniCore 添加w4 inifnicore相关内容,以及将Quantization config划入InfiniCore * issue/175 - qy device support qy_page_131: add qy device success qy inference_server.py * Issue/170 - Add HYGON support and improve device type handling. * Issue/193: feats for deployment Signed-off-by:
Ceng23333 <441651826@qq.com> * skip responding eos token Signed-off-by:
Ceng23333 <441651826@qq.com> * issue/143 use add_rmsnorm, nt flash attn, nt kv caching * issue/204 - support graph in server scripts * issue/208 - adapt to ali ppu * rebase main * issue/216 feat: support static kv cache in server * fix llm server cache config * demo131 - resolve mishandled conflicts * demo131 - further adjust attn and caching logic * demo131 - resolve merge requirements --------- Signed-off-by:
Ceng23333 <441651826@qq.com> Co-authored-by:
wooway777 <wooway777@gmail.com> Co-authored-by:
xgqdut2016 <kenan_gewei@163.com> Co-authored-by:
gongchensu <zhuyue_134@qq.com> Co-authored-by:
Ceng23333 <441651826@qq.com> Co-authored-by:
PanZezhong <panzezhong@qiyuanlab.com> Co-authored-by:
MaYuhang <2902139028@qq.com>
-
- 10 Feb, 2026 3 commits
-
-
PanZezhong authored
-
PanZezhong authored
-
PanZezhong authored
-
- 08 Jan, 2026 1 commit
-
-
PanZezhong authored
-
- 06 Jan, 2026 1 commit
-
-
PanZezhong authored
-
- 30 Dec, 2025 1 commit
-
-
PanZezhong authored
-
- 29 Dec, 2025 1 commit
-
-
Jiacheng Huang authored
* 将 `cpp.LlamaForCausalLM` 提出,变为 `infinilm.infer_engine.InferEngine` * 将 `Config` 构造逻辑拆分至 `AutoConfig` 中 * 在 `examples` 脚本中直接构造 `InferEngine` * 将 `random_sample` 计算放入模型中 * 为 `InferEngine` 单独实现 `generate` * 允许通过 `GenerationConfig` 传递 `temperature`、`top_k`、`top_p` * 将 `random_sample` 处理从 `LlamaForCausalLM` 中转移到 `RankWorker` 里 * 在 `InferEngine.generate` 中直接 `append(output_id)` * 修复 commit `13aa90c57de369f9985593c0066b6b06a7508b24` 引入的分布式卡死问题 * 将 `InferEngine.forward` 的接口与 C++ 层的 `InferEngine.Input` 对齐 * 提供了 `_measure_and_log_time` 参数来开启之前的 `generate` 内部计时功能
-
- 26 Dec, 2025 1 commit
-
-
PanZezhong authored
-
- 23 Dec, 2025 1 commit
-
-
PanZezhong authored
-
- 19 Dec, 2025 1 commit
-
-
Jiacheng Huang authored
-
- 17 Dec, 2025 1 commit
-
-
Jiacheng Huang authored
-
- 11 Dec, 2025 1 commit
-
-
thatPepe authored
Issue/121 - cache managements
-
- 09 Dec, 2025 1 commit
-
-
pengcheng888 authored
-
- 08 Dec, 2025 1 commit
-
-
Ceng authored
-
- 07 Dec, 2025 1 commit
-
-
pengcheng888 authored
-
- 06 Dec, 2025 2 commits
-
-
PanZezhong authored
-
PanZezhong1725 authored
-