Unverified Commit 71c70586 authored by qinyiqun's avatar qinyiqun Committed by GitHub
Browse files

demo131 - multiple issues regarding quatization, qy, etc.



* issue/204 - support graph in server scripts

* issue/208 - adapt to ali ppu

* issue/194 - add quantization modify configs accordingly

支持nv w8 1batch 1tp

增加json支持

InfiniLM 增加量化层和global config

以一种比较优雅的方式增加了quant config的支持

修改部分代码结构,删除无用代码

跟随inifnicore修改

删除所有的model_config,统一使用global_config

跟随InfiniLM最新代码修改

修改函数参数顺序

改名global config 为model config

Refactor: add new API alongside legacy interfaces with deprecation warnings

添加w4 inifnicore相关内容,以及将Quantization config划入InfiniCore

添加w4 inifnicore相关内容,以及将Quantization config划入InfiniCore

* issue/175 - qy device support

qy_page_131: add qy device

success qy inference_server.py

* Issue/170 - Add HYGON support and improve device type handling.

* Issue/193: feats for deployment
Signed-off-by: default avatarCeng23333 <441651826@qq.com>

* skip responding eos token
Signed-off-by: default avatarCeng23333 <441651826@qq.com>

* issue/143 use add_rmsnorm, nt flash attn, nt kv caching

* issue/204 - support graph in server scripts

* issue/208 - adapt to ali ppu

* rebase main

* issue/216 feat: support static kv cache in server

* fix llm server cache config

* demo131 - resolve mishandled conflicts

* demo131 - further adjust attn and caching logic

* demo131 - resolve merge requirements

---------
Signed-off-by: default avatarCeng23333 <441651826@qq.com>
Co-authored-by: default avatarwooway777 <wooway777@gmail.com>
Co-authored-by: default avatarxgqdut2016 <kenan_gewei@163.com>
Co-authored-by: default avatargongchensu <zhuyue_134@qq.com>
Co-authored-by: default avatarCeng23333 <441651826@qq.com>
Co-authored-by: default avatarPanZezhong <panzezhong@qiyuanlab.com>
Co-authored-by: default avatarMaYuhang <2902139028@qq.com>
parent ee59b3f5
Subproject commit 5ed07097faa6c50199c4a3b66e5ed37d4fbfccc2
......@@ -6,6 +6,7 @@ set_toolchains("gcc")
-- Add spdlog from third_party directory
add_includedirs("third_party/spdlog/include")
add_includedirs("third_party/json/single_include/")
target("infinicore_infer")
set_kind("shared")
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment