• qinyiqun's avatar
    demo131 - multiple issues regarding quatization, qy, etc. · 71c70586
    qinyiqun authored
    
    
    * issue/204 - support graph in server scripts
    
    * issue/208 - adapt to ali ppu
    
    * issue/194 - add quantization modify configs accordingly
    
    支持nv w8 1batch 1tp
    
    增加json支持
    
    InfiniLM 增加量化层和global config
    
    以一种比较优雅的方式增加了quant config的支持
    
    修改部分代码结构,删除无用代码
    
    跟随inifnicore修改
    
    删除所有的model_config,统一使用global_config
    
    跟随InfiniLM最新代码修改
    
    修改函数参数顺序
    
    改名global config 为model config
    
    Refactor: add new API alongside legacy interfaces with deprecation warnings
    
    添加w4 inifnicore相关内容,以及将Quantization config划入InfiniCore
    
    添加w4 inifnicore相关内容,以及将Quantization config划入InfiniCore
    
    * issue/175 - qy device support
    
    qy_page_131: add qy device
    
    success qy inference_server.py
    
    * Issue/170 - Add HYGON support and improve device type handling.
    
    * Issue/193: feats for deployment
    Signed-off-by: default avatarCeng23333 <441651826@qq.com>
    
    * skip responding eos token
    Signed-off-by: default avatarCeng23333 <441651826@qq.com>
    
    * issue/143 use add_rmsnorm, nt flash attn, nt kv caching
    
    * issue/204 - support graph in server scripts
    
    * issue/208 - adapt to ali ppu
    
    * rebase main
    
    * issue/216 feat: support static kv cache in server
    
    * fix llm server cache config
    
    * demo131 - resolve mishandled conflicts
    
    * demo131 - further adjust attn and caching logic
    
    * demo131 - resolve merge requirements
    
    ---------
    Signed-off-by: default avatarCeng23333 <441651826@qq.com>
    Co-authored-by: default avatarwooway777 <wooway777@gmail.com>
    Co-authored-by: default avatarxgqdut2016 <kenan_gewei@163.com>
    Co-authored-by: default avatargongchensu <zhuyue_134@qq.com>
    Co-authored-by: default avatarCeng23333 <441651826@qq.com>
    Co-authored-by: default avatarPanZezhong <panzezhong@qiyuanlab.com>
    Co-authored-by: default avatarMaYuhang <2902139028@qq.com>
    71c70586
README.md 6.37 KB