Commits · fe851fbc27e4aebbbf1bd39b8538fc8807504bc9 · OpenDAS / Lmdeploy

24 Mar, 2024 1 commit
- 0.2.6版本新增文件补充 · fe851fbc
  zhouxiang authored Mar 24, 2024
  
  fe851fbc
22 Mar, 2024 1 commit
- 同步0.2.6代码 · d7117b95
  zhouxiang authored Mar 22, 2024
  
  d7117b95
13 Dec, 2023 1 commit

AllentDan authored Dec 13, 2023

* add api.py

* update serve function

* add model_name arg and provide examples

* docstring

* remove service_available

* type hint

5c9aa51a

25 Oct, 2023 1 commit

Add more user-friendly CLI (#541) · 169d5169

RunningLeon authored Oct 25, 2023

* add

* import fire in main

* wrap to speed up fire cli

* update

* update docs

* update docs

* fix

* resolve commennts

* resolve confict and add test for cli

169d5169

19 Oct, 2023 1 commit

robust incremental decode for leading space (#581) · 186bfd2e

AllentDan authored Oct 19, 2023

* robust incremental decode for leading space

* speed up lookup as prefix_space_tokens is shorter than no_prefix_space_tokens

* add UT and fix qwen stuff

186bfd2e

26 Sep, 2023 1 commit

expose stop words and filter eoa (#352) · 327deaee

AllentDan authored Sep 26, 2023

* expose stop words

* support string

* fix

* remove eoa from chatbot

* remove eoa of turbomind

* fix ut

* suffix wheel and fix InternLM no system bug

327deaee

20 Sep, 2023 1 commit

Support InternLM 20B (#440) · df7955de

Lyu Han authored Sep 20, 2023



* better profiler

* wait for releasing mem

* remove fire

* remove support for multiple model benchmark

* comments

* support actual seqlen

* change chat template

* update

* fix ut

* int->size_t

* output more details

* correct tp

* rollback

* update

* update readme

* add 'internlm-chat' as the default tag for internlm chat models

* rollback tokenizer

---------
Co-authored-by: AllentDan <AllentDan@yeah.net>
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

df7955de

11 Sep, 2023 1 commit

Support codellama (#359) · 65c662f9

Lyu Han authored Sep 11, 2023

* tmp

* add demo for codellama inference

* update

* update

* update

* update codellama.md

* export rope_theta

* update

* update doc

* fix client.py

* define SamplingParam

* rollback 'end'

* rotary_emb_base to rotary_embedding_base

* change to baichuan2-7b

65c662f9

05 Jul, 2023 1 commit

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d