Commits · 327deaee4122b3ff7780e36d0e481c5997dbe1fa · guobj / Qwen_lmdeploy

26 Sep, 2023 1 commit

expose stop words and filter eoa (#352) · 327deaee

AllentDan authored Sep 26, 2023

* expose stop words

* support string

* fix

* remove eoa from chatbot

* remove eoa of turbomind

* fix ut

* suffix wheel and fix InternLM no system bug

327deaee

11 Sep, 2023 1 commit

Support codellama (#359) · 65c662f9

Lyu Han authored Sep 11, 2023

* tmp

* add demo for codellama inference

* update

* update

* update

* update codellama.md

* export rope_theta

* update

* update doc

* fix client.py

* define SamplingParam

* rollback 'end'

* rotary_emb_base to rotary_embedding_base

* change to baichuan2-7b

65c662f9

04 Sep, 2023 1 commit

Fix profile_serving hung issue (#344) · edb7c6ec

Lyu Han authored Sep 04, 2023

* read data after start processes

* fix hang

* fix exceptions when request_output_len is 0

edb7c6ec

01 Sep, 2023 1 commit

Decode generated token_ids incrementally (#309) · 9bfe03c6

AllentDan authored Sep 01, 2023

* add incremental decoding for turbomind

* update TIS

* fix triton post processing

* update doc

* fix typo

* SentencePieceTokenizer incremental decode, add qwen message prompt

* docstring

* update bot

9bfe03c6

21 Aug, 2023 1 commit

Pass chat template args including meta_prompt to model (#225) · 7785142d

AllentDan authored Aug 21, 2023

* pass args like meta_prompt to model

* update chatbot

* update

* rollback

* update llama2 and qwen

* refine

7785142d

14 Aug, 2023 1 commit
- Fix TIS client got-no-space-result side effect brought by PR #197 (#222) · 68296844
  Lyu Han authored Aug 14, 2023
```
* rollback

* rollback chatbot.py
```
  68296844
07 Aug, 2023 2 commits
- Add non-stream inference api for chatbot (#200) · 3de0dbb6
  lvhan028 authored Aug 07, 2023
```
* add non-stream inference api for chatbot

* update according to reviewer's comments
```
  3de0dbb6
- Improve postprocessing in TIS serving by applying Incremental de-tokenizing (#197) · 0ed1e4d4
  lvhan028 authored Aug 07, 2023
```
* change to incremental decoding

* update
```
  0ed1e4d4
03 Aug, 2023 1 commit
- Move lmdeploy/turbomind/utils.py to lmdeploy/utils.py (#191) · 7a2128be
  lvhan028 authored Aug 03, 2023
  
  7a2128be
31 Jul, 2023 1 commit

Support Runtime tensor parallelism (#158) · 4767b04d

q.yao authored Jul 31, 2023

* works on interlm and vicuna

* support GQA

* remove comment

* update readme, add logger, default tp=1

* remove log

4767b04d

27 Jul, 2023 1 commit
- add model_name param for chatbot (#174) · 7bc8d171
  MaxMatthew authored Jul 27, 2023
  
  7bc8d171
23 Jul, 2023 1 commit

Refactor the chat template of supported models using factory pattern (#144) · 7b470f07

lvhan028 authored Jul 23, 2023

* refactor model.py and support baichuan-7b

* remove model_name

* remove hard session_len

* export tokenizer.py to target dir

* remove model_name from client

* remove model_name

* update

* correct throughput equation

* fix session.response

* update serving.md

* update readme

* update according to review comments

* update

* update

* update

* update

7b470f07

21 Jul, 2023 1 commit

remove slicing reponse and add resume api (#154) · b728064e

MaxMatthew authored Jul 21, 2023

* Fix lmdeploy.serve.turbomind bug
* add __init__.py for turbomind
* add resume function
* fix the assignment for session.response

* Fix code style

b728064e

19 Jul, 2023 1 commit
- fix the offset during streaming chat (#142) · 289ffa3c
  lvhan028 authored Jul 19, 2023
  
  289ffa3c
14 Jul, 2023 1 commit
- add puyu model for internal use (#105) · 4cfb118f
  lvhan028 authored Jul 14, 2023
```
* add puyu model for internal use

* get/set session

* update

* add docstring
```
  4cfb118f
12 Jul, 2023 1 commit
- add docstring for turbomind (#97) · 955c019c
  lvhan028 authored Jul 12, 2023
```
* add docstring

* update

* update

* fix according to review results
```
  955c019c
05 Jul, 2023 1 commit

update internlm‘s chat template (#54) · 3de27ead

lvhan028 authored Jul 05, 2023

* update internlm model

* update

* update

* update

* update

* update temperature, topk and top_p

* update

* update

* loosen log level

3de27ead

03 Jul, 2023 1 commit
- install triton_example and TransformerTritonBackend to runtime and lib respectively (#39) · bb6f8060
  lvhan028 authored Jul 03, 2023
  
  bb6f8060
30 Jun, 2023 2 commits
- rename serve/fastertransformer to serve/turbomind (#31) · e8ab4ba3
  lvhan028 authored Jun 30, 2023
```
* rename lmdeploy/serve/fastertransformer to lmdeploy/serve/turbomind

* update

* update
```
  e8ab4ba3
- rename llmdeploy to lmdeploy (#30) · 46f4738c
  lvhan028 authored Jun 30, 2023
```
* change llmdeploy to lmdeploy

* update logo

* update readme
```
  46f4738c
29 Jun, 2023 2 commits
- fix crash when conversation history out of limit (#28) · cb8ac1b0
  lvhan028 authored Jun 29, 2023
  
  cb8ac1b0
- use huggingface tokenizer (#26) · 64936449
  q.yao authored Jun 29, 2023
```
* add hf tokenizer

* format

* fix for comment

* don't skip speical tokens
```
  64936449
25 Jun, 2023 1 commit

Add profile (#15) · 23c05372

lvhan028 authored Jun 25, 2023

* remove constraints on model name

* remove duplicate model converter

* add profile

* get eos and bos from server

* update stop_words

* update sequence_length when the last generated token is eos_id

* fix

* fix

* check-in models

* valicate model_name

* make stop_words as property

* debug profiling

* better stats

* fix assistant reponse

* update profile serving

* update

* update

23c05372

20 Jun, 2023 1 commit

update scripts for deploying llama family model to fastertransformer triton models (#4) · 2bf481fb

lvhan028 authored Jun 20, 2023

* add scripts for deploying llama family models via fastertransformer

* fix

* fix

* set symlinks True when copying triton models templates

* pack model repository for triton inference server

* add exception

* fix

* update config.pbtxt and launching scripts

2bf481fb

18 Jun, 2023 1 commit
- add chatbot (#2 ) · ef2adb04
  lvhan028 authored Jun 18, 2023
  
  ef2adb04