Commits · 823ad84912a631317b1ae9978a8d467195335627 · OpenDAS / Lmdeploy

03 Nov, 2023 3 commits

Refactor model conversion (#296) · 823ad849

Chen Xin authored Nov 03, 2023

* split deploy.py

* fix get_cuda_tensor

* deploy qwen_awq

* fix lint

* add docstring

* fix

* support baichuan/baichuan-awq

* parameterizing size_per_head

* remove try/except

* limit input model_format

* add quant_path param

* remove old deploy.py

* fix path

* fix transformer layer range when load bins

* fix qwen init

* split & save log

* relative import

* update get_config

* WeightFileMgr -> Reader

* rename

* update

* fix init_layer_id

* rename llama.py -> meta_llama.py, hf.py -> llama.py

* reduce code

* update arg description

* fix meta llama

* manually cleanup meta model params

823ad849

add cli to list the supported model names (#639) · 1bbc6e05
RunningLeon authored Nov 03, 2023
```
* update

* resolve comment
```
1bbc6e05
fix: gradio gr.Button.update deprecated after 4.0.0 (#637) · 6e91e5ce
Yam(长琴) authored Nov 03, 2023

6e91e5ce

01 Nov, 2023 1 commit

Improve api_server and webui usage (#544) · 373bd013

AllentDan authored Nov 01, 2023

* make IPv6 compatible, safe run for coroutine interrupting

* instance_id -> session_id and fix api_client.py

* update doc

* remove useless faq

* safe ip mapping

* update app.py

* WIP completion

* completion

* update doc

* disable interactive mode for /v1/chat/completions

* docstring

* docstring

* refactor gradio

* update gradio

* udpate

* update doc

* rename

* session_id default -1

* missed two files

* add a APIClient

* add chat func for APIClient

* refine

* add concurrent function

* sequence_start, sequence_end --> interactive_mode

* update doc

* comments

* doc

* better text completion

* remove /v1/embeddings

* comments

* deprecate generate and use /v1/interactive/completions

* /v1/interactive/completion -> /v1/chat/interactive

* embeddings

* rename

* remove wrong arg description

* docstring

* fix

* update cli

* update doc

* strict session_len limit condition

* pass model args to api_server

373bd013

30 Oct, 2023 1 commit
- bump version to v0.0.13 (#620) · 56942c43
  Lyu Han authored Oct 30, 2023
  
  56942c43
25 Oct, 2023 3 commits

support inference a batch of prompts (#467) · ac3500b5
AllentDan authored Oct 25, 2023
```
* support inference a batch of prompts

* docstring and assert
```
ac3500b5

Add more user-friendly CLI (#541) · 169d5169

RunningLeon authored Oct 25, 2023

* add

* import fire in main

* wrap to speed up fire cli

* update

* update docs

* update docs

* fix

* resolve commennts

* resolve confict and add test for cli

169d5169

Add "build from docker" section (#602) · 7283781e

Lyu Han authored Oct 25, 2023

* add build from docker section

* update

* install python package

* update

* update

* update

7283781e

24 Oct, 2023 2 commits
- bump version to v0.0.12 (#604) · 96f1b8ef
  Lyu Han authored Oct 24, 2023
  
  96f1b8ef
- Fix crash and remove `sys_instruct` from `chat.py` and `client.py`(#591) · ffe4ba9c
  Chen Xin authored Oct 24, 2023
```
* fix crash

* update profile_generation.py

* format

* use self.bos_id

* remove sys_instruct
```
  ffe4ba9c
23 Oct, 2023 2 commits
- Revert "[Docs] Simplify `build.md` (#370)" (#586) · af2f072e
  pppppM authored Oct 23, 2023
```
This reverts commit 4b5c2bda.
```
  af2f072e
- update solar chat template (#587) · baf1801b
  AllentDan authored Oct 23, 2023
  
  baf1801b
19 Oct, 2023 2 commits
- robust incremental decode for leading space (#581) · 186bfd2e
  AllentDan authored Oct 19, 2023
```
* robust incremental decode for leading space

* speed up lookup as prefix_space_tokens is shorter than no_prefix_space_tokens

* add UT and fix qwen stuff
```
  186bfd2e
- add solar chat template (#576) · 70a5c63a
  AllentDan authored Oct 19, 2023
  
  70a5c63a
18 Oct, 2023 2 commits
- avoid split chinese characters during decoding (#566) · eb3b4dc9
  AllentDan authored Oct 18, 2023
  
  eb3b4dc9
- change 'model_format' to 'qwen' when 'model_name' starts with 'qwen' (#575) · 9c3634ec
  Lyu Han authored Oct 18, 2023
  
  9c3634ec
17 Oct, 2023 1 commit
- bump version to v0.0.11 (#567) · bb3cce9a
  Lyu Han authored Oct 17, 2023
  
  bb3cce9a
16 Oct, 2023 2 commits
- Move `tokenizer.py` to the folder of lmdeploy (#543) · c261b49d
  q.yao authored Oct 16, 2023
```
* move tokenizer

* remove Tokenizer in init

* update deploy.py
```
  c261b49d
- free runner disk (#552) · f4422fab
  Chen Xin authored Oct 16, 2023
```
* free runner disk

* limit cpu

* docker.yml

* keep swap

* keep swap
```
  f4422fab
13 Oct, 2023 3 commits

[doc] Update benchmark command in w4a16.md (#500) · 0b861c48

del-zhenwu authored Oct 13, 2023



* [doc] Update benchmark command in w4a16.md

* Update w4a16.md

* Update w4a16.md

add pip install nvidia-ml-py

* [doc] Update w4a16.md

* fix lint error
Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

* [doc] update model_path & prompt_tokens
Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

---------
Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

0b861c48

Add tp hint for deployment (#555) · 77a26812
Chen Xin authored Oct 13, 2023
```
* add tp hint for deploy

* fix lint

* assert tp in turbomind

* fix lint
```
77a26812
Fix typing of openai protocol. (#554) · 6904053f
YiiSh authored Oct 13, 2023

6904053f

12 Oct, 2023 2 commits
- support deploy qwen-14b-chat (#482) · b21239a8
  Chen Xin authored Oct 12, 2023
```
* support deploy qwen-14b-chat

* update README

* load safetensors first
```
  b21239a8
- update huggingface internlm-chat-7b model url (#546) · 27e12477
  AllentDan authored Oct 12, 2023
  
  27e12477
11 Oct, 2023 3 commits
- [bug] fix mismatched shape for decoder output tensor (#517) · 0d2a151e
  akhoroshev authored Oct 11, 2023
  
  0d2a151e
- Fix typo in `docs/en/pytorch.md` (#539) · 169d088a
  Shahrukh Khan authored Oct 11, 2023
  
  169d088a
- make IPv6 compatible, safe run for coroutine interrupting (#487) · 759e1ddf
  AllentDan authored Oct 11, 2023
```
* make IPv6 compatible, safe run for coroutine interrupting

* instance_id -> session_id and fix api_client.py

* update doc

* remove useless faq

* safe ip mapping

* update app.py

* remove print

* update doc
```
  759e1ddf
09 Oct, 2023 3 commits
- set the default value of being 0 (#532) · fbd9770a
  Lyu Han authored Oct 10, 2023
  
  fbd9770a
- Change `shared_instance` type from `weakptr` to `shared_ptr` (#507) · 19fea86c
  Lyu Han authored Oct 09, 2023
```
* change shared_instances_ from weakptr to sharedptr

* update
```
  19fea86c
- Support CORS for openai api server (#481) · 02684144
  aisensiy authored Oct 09, 2023
```
* Support CORS for openai api server

* Remove unnecessary var

* Add CORS support follow the same style with vllm
```
  02684144
26 Sep, 2023 7 commits
- bump version to v0.0.10 (#474) · b58a9dff
  Lyu Han authored Sep 26, 2023
  
  b58a9dff
- Fix memory leak (#488) · 5d87c20f
  Lyu Han authored Sep 26, 2023
```
* Fix memory leak

* modern c++
```
  5d87c20f
- fix benchmark serving cannot use Qwen tokenizer (#443) · 97dcdff7
  AllentDan authored Sep 26, 2023
```
* fix benchmark serving cannot use Qwen tokenizer

* update benchmark readme
```
  97dcdff7
- Fix compatibility issues with Pydantic 2 (#465) · 22cd7d15
  aisensiy authored Sep 26, 2023
  
  22cd7d15
- fix race condition (#460) · a54e3e09
  akhoroshev authored Sep 26, 2023
  
  a54e3e09
- expose stop words and filter eoa (#352) · 327deaee
  AllentDan authored Sep 26, 2023
```
* expose stop words

* support string

* fix

* remove eoa from chatbot

* remove eoa of turbomind

* fix ut

* suffix wheel and fix InternLM no system bug
```
  327deaee
- [feature] Graceful termination of background threads in LlamaV2 (#458) · 0cc667e1
  akhoroshev authored Sep 26, 2023
```
* cuda allocator fix

* graceful termination

* lint and compilation fix
```
  0cc667e1
25 Sep, 2023 3 commits
- Miss meta instruction of internlm-chat model (#470) · ce9e0756
  Lyu Han authored Sep 25, 2023
  
  ce9e0756
- Fix side effect brought by supporting codellama: `sequence_start` is always... · e980377a
  Lyu Han authored Sep 25, 2023
```
Fix side effect brought by supporting codellama: `sequence_start` is always true when calling `model.get_prompt` (#466)
```
  e980377a
- Fix typo in README.md (#462) · 71945001
  Ikko Eltociear Ashimine authored Sep 25, 2023
```
quantilized -> quantized
```
  71945001