1. 11 Feb, 2026 1 commit
    • qinyiqun's avatar
      demo131 - multiple issues regarding quatization, qy, etc. · 71c70586
      qinyiqun authored
      
      
      * issue/204 - support graph in server scripts
      
      * issue/208 - adapt to ali ppu
      
      * issue/194 - add quantization modify configs accordingly
      
      支持nv w8 1batch 1tp
      
      增加json支持
      
      InfiniLM 增加量化层和global config
      
      以一种比较优雅的方式增加了quant config的支持
      
      修改部分代码结构,删除无用代码
      
      跟随inifnicore修改
      
      删除所有的model_config,统一使用global_config
      
      跟随InfiniLM最新代码修改
      
      修改函数参数顺序
      
      改名global config 为model config
      
      Refactor: add new API alongside legacy interfaces with deprecation warnings
      
      添加w4 inifnicore相关内容,以及将Quantization config划入InfiniCore
      
      添加w4 inifnicore相关内容,以及将Quantization config划入InfiniCore
      
      * issue/175 - qy device support
      
      qy_page_131: add qy device
      
      success qy inference_server.py
      
      * Issue/170 - Add HYGON support and improve device type handling.
      
      * Issue/193: feats for deployment
      Signed-off-by: default avatarCeng23333 <441651826@qq.com>
      
      * skip responding eos token
      Signed-off-by: default avatarCeng23333 <441651826@qq.com>
      
      * issue/143 use add_rmsnorm, nt flash attn, nt kv caching
      
      * issue/204 - support graph in server scripts
      
      * issue/208 - adapt to ali ppu
      
      * rebase main
      
      * issue/216 feat: support static kv cache in server
      
      * fix llm server cache config
      
      * demo131 - resolve mishandled conflicts
      
      * demo131 - further adjust attn and caching logic
      
      * demo131 - resolve merge requirements
      
      ---------
      Signed-off-by: default avatarCeng23333 <441651826@qq.com>
      Co-authored-by: default avatarwooway777 <wooway777@gmail.com>
      Co-authored-by: default avatarxgqdut2016 <kenan_gewei@163.com>
      Co-authored-by: default avatargongchensu <zhuyue_134@qq.com>
      Co-authored-by: default avatarCeng23333 <441651826@qq.com>
      Co-authored-by: default avatarPanZezhong <panzezhong@qiyuanlab.com>
      Co-authored-by: default avatarMaYuhang <2902139028@qq.com>
      71c70586
  2. 10 Feb, 2026 3 commits
  3. 30 Jan, 2026 1 commit
  4. 06 Jan, 2026 1 commit
  5. 30 Dec, 2025 1 commit
  6. 29 Dec, 2025 1 commit
    • Jiacheng Huang's avatar
      issue/160: 梳理 InferEngine 相关接口 · 96e53dbb
      Jiacheng Huang authored
      * 将 `cpp.LlamaForCausalLM` 提出,变为 `infinilm.infer_engine.InferEngine`
      
      * 将 `Config` 构造逻辑拆分至 `AutoConfig` 中
      
      * 在 `examples` 脚本中直接构造 `InferEngine`
      
      * 将 `random_sample` 计算放入模型中
      
      * 为 `InferEngine` 单独实现 `generate`
      
      * 允许通过 `GenerationConfig` 传递 `temperature`、`top_k`、`top_p`
      
      * 将 `random_sample` 处理从 `LlamaForCausalLM` 中转移到 `RankWorker` 里
      
      * 在 `InferEngine.generate` 中直接 `append(output_id)`
      
      * 修复 commit `13aa90c57de369f9985593c0066b6b06a7508b24` 引入的分布式卡死问题
      
      * 将 `InferEngine.forward` 的接口与 C++ 层的 `InferEngine.Input` 对齐
      
      * 提供了 `_measure_and_log_time` 参数来开启之前的 `generate` 内部计时功能
      96e53dbb
  7. 23 Dec, 2025 1 commit
  8. 19 Dec, 2025 1 commit
  9. 18 Dec, 2025 1 commit
  10. 17 Dec, 2025 1 commit
  11. 11 Dec, 2025 1 commit
  12. 08 Dec, 2025 1 commit
  13. 07 Dec, 2025 1 commit
  14. 06 Dec, 2025 2 commits