Commits · 450757b2bdf3b124cfa14950e4f8ed6f6d15550c · OpenDAS / Lmdeploy

11 Sep, 2023 3 commits

bump version to v0.0.8 (#401) · 450757b2
Lyu Han authored Sep 11, 2023

450757b2
[Fix] Update puyu model (#399) · cfec5bed
liukuikun authored Sep 11, 2023

cfec5bed

Lyu Han authored Sep 11, 2023

* tmp

* add demo for codellama inference

* update

* update

* update

* update codellama.md

* export rope_theta

* update

* update doc

* fix client.py

* define SamplingParam

* rollback 'end'

* rotary_emb_base to rotary_embedding_base

* change to baichuan2-7b

65c662f9

08 Sep, 2023 1 commit

Support baichuan2-chat chat template (#378) · 55764e0b

WRH authored Sep 08, 2023



* support baichuan2-chat

* update args from generation config

* update deploy.py

* update readme

* tested with tp

* step-1 when last id is eos

* add news

---------
Co-authored-by: chenxin <chenxin@pjlab.org.cn>

55764e0b

07 Sep, 2023 3 commits
- fix exceed session len core dump for chat and generate (#366) · ce21a318
  AllentDan authored Sep 07, 2023
  
  ce21a318
- [Fix] Set max dynamic smem size for decoder MHA to support context length > 8k (#377) · 71ade772
  Lyu Han authored Sep 07, 2023
```
* Fix crash when context window size is large by setting max dynamic smem size

* fix linting
```
  71ade772
- bug-fix: always use stream mode to enable persistent batching (#346) · 57cf99b9
  fade_away authored Sep 07, 2023
```
Co-authored-by: sleepwalker <just_for_singing@foxmail.com>
```
  57cf99b9
06 Sep, 2023 1 commit
- Update logo (#372) · e4701226
  Lyu Han authored Sep 06, 2023
  
  e4701226
05 Sep, 2023 2 commits
- [Docs] Simplify `build.md` (#370) · 4b5c2bda
  pppppM authored Sep 05, 2023
```
* use conda install nccl, openmpi and rapidjson

* update en doc
```
  4b5c2bda
- [Doc] Fix quantization docs link (#367) · 683c3fe9
  Zhihao Lin authored Sep 05, 2023
  
  683c3fe9
04 Sep, 2023 2 commits
- bump version to v0.0.7 (#358) · d065f3e4
  Lyu Han authored Sep 04, 2023
  
  d065f3e4
- Fix profile_serving hung issue (#344) · edb7c6ec
  Lyu Han authored Sep 04, 2023
```
* read data after start processes

* fix hang

* fix exceptions when request_output_len is 0
```
  edb7c6ec
01 Sep, 2023 2 commits

Decode generated token_ids incrementally (#309) · 9bfe03c6

AllentDan authored Sep 01, 2023

* add incremental decoding for turbomind

* update TIS

* fix triton post processing

* update doc

* fix typo

* SentencePieceTokenizer incremental decode, add qwen message prompt

* docstring

* update bot

9bfe03c6

Package 'bin/llama_gemm' to wheel (#320) · 22e8b2ca
Chen Xin authored Sep 01, 2023
```
* pack llama_gemm

* update CMakeLists.txt

* remove candidate

* update MANIFEST.in
```
22e8b2ca

30 Aug, 2023 1 commit
- Update FAQ for restful api (#319) · eaccbc0a
  AllentDan authored Aug 30, 2023
```
* update FAQ for restful api

* refine
```
  eaccbc0a
29 Aug, 2023 4 commits

Add flashattention2 (#196) · 452822a4

q.yao authored Aug 29, 2023



* first

* fix causal mask

* disable flash attention2 on sm70

* fix 2

* update readme

* clang-format

* disable ft2 on windows

* fix lint

* fix build

* fix build

* fix long kv seq

* fix lint

* sync copy output

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>
Co-authored-by: irexyc <irexyc@gmail.com>

452822a4

Fix turbomind import error on windows (#316) · d4d609bd
Chen Xin authored Aug 29, 2023

d4d609bd

fix(kvint8): update doc (#315) · a48e2d27

tpoisonooo authored Aug 29, 2023



* fix(kvint8): update doc

* style(lmdeploy): format

* style(kv_qparams.py): linting

* fix lint

* Update kv_int8.md

* Update kv_int8.md

---------
Co-authored-by: AllentDan <AllentDan@yeah.net>

a48e2d27

Fix readthedocs building (#321) · 08b2812c
RunningLeon authored Aug 29, 2023

08b2812c

25 Aug, 2023 2 commits
- bump version to v0.0.6 (#283) · cfabbbd7
  Lyu Han authored Aug 25, 2023
  
  cfabbbd7
- Import turbomind in gradio server only when it is needed (#303) · 59f8e674
  AllentDan authored Aug 25, 2023
  
  59f8e674
24 Aug, 2023 4 commits

Enable the Gradio server to call inference services through the RESTful API (#287) · 4279d8ca

AllentDan authored Aug 24, 2023



* app use async engine

* add stop logic

* app update cancel

* app support restful-api

* update doc and use the right model name

* set doc url root

* add comments

* add an example

* renew_session

* update readme.md

* resolve comments

* Update restful_api.md

* Update restful_api.md

* Update restful_api.md

---------
Co-authored-by: tpoisonooo <khj.application@aliyun.com>

4279d8ca

[Feature] Support decode with DP in pytorch (#193) · 81f29837

WRH authored Aug 24, 2023

* support decode

* unit test and benckmark and improve

* remove some drafts

* enable numerical test

* minor

* add some benchmark data

* add more output

* update interface

* remove debugs

* format

* update docstring

* remove print and add benchmark results

* use logits & add main

* fix rb

* dump large

* update test

* update test decode

* add decimal

81f29837

Pad tok_embedding and output weights to make their shape divisible by TP (#285) · 4903d3cc

Lyu Han authored Aug 24, 2023

* Pad tok_embedding and output weights to make their shape divisible by TP

* update

* update

* update

* update

* update llamaBatch

4903d3cc

[Fix] Fix llama2 70b & qwen quantization error (#273) · d5cb0be2
pppppM authored Aug 24, 2023
```
* fix llama2 70b

* fix qwen quantization

* remove pdb

* add faq
```
d5cb0be2

23 Aug, 2023 1 commit
- Change to github-hosted runner for building docker image (#291) · e5bfd387
  RunningLeon authored Aug 23, 2023
```
* change to github runner

* update
```
  e5bfd387
22 Aug, 2023 3 commits

[Fix] Fix building with CUDA 11.3 (#280) · 9e366482
Li Zhang authored Aug 22, 2023
```
* disable cache hint for CUDA < 11.4

* fix lint

* fix lint

* fix cuda-11.3 build
```
9e366482

Update workflow for building docker image (#282) · 06327355

RunningLeon authored Aug 22, 2023

* update

* debug

* Revert "debug"

This reverts commit 1c1f1d75591e44bf315c720f4954d1ea30b92989.

* update

06327355

Add Restful API (#223) · d5c10e7a

AllentDan authored Aug 22, 2023

* add restful api

* refine

* add simple doc

* lint

* add uvicorn requirement

* more args

* add llama2

* docstring

* update doc

* save

* refine

* lint

* better decode

* add v1/embedding

* add GenerateRequest

* add llama2 chat template

* correct profiling

* update documents

* add length judge

* add faq

* update doc and rename req_que to req_queue

* fix md link, use get_logger, fix sequence_end bug

* use another doc link for go to avoid lint error

* add api_client.py

* update doc

* update doc

* update function interface

* update FAQ

* resolve comments

d5c10e7a

21 Aug, 2023 4 commits
- Pass chat template args including meta_prompt to model (#225) · 7785142d
  AllentDan authored Aug 21, 2023
```
* pass args like meta_prompt to model

* update chatbot

* update

* rollback

* update llama2 and qwen

* refine
```
  7785142d
- docs(quantization): update description (#272) · f44ef17c
  tpoisonooo authored Aug 21, 2023
  
  f44ef17c
- add readthedocs (#208) · c238f1cd
  RunningLeon authored Aug 21, 2023
```
* add readthedocs configs

* update readme

* fix link

* update

* remove turbomind in api

* update

* fix comment and remove api
```
  c238f1cd
- Check-in FAQ (#256) · 2f29a3c7
  Lyu Han authored Aug 21, 2023
```
* Check-in FAQ

* update

* update
```
  2f29a3c7
18 Aug, 2023 3 commits
- Support TP for w4a16 (#262) · 89f3d322
  Li Zhang authored Aug 18, 2023
  
  89f3d322
- [Feature] Support Qwen-7B, dynamic NTK scaling and logN scaling in turbomind (#230) · 4a60b45d
  Li Zhang authored Aug 18, 2023
```
* qwen support

* dynamic ntk & logn attn

* fix ntk & add chat template

* fix ntk scaling & stop words

* fix lint

* add tiktoken to requirements.txt

* fix tokenizer, set model format automatically

* update model.py

* update readme

* fix lint
```
  4a60b45d
- Add 'accelerate' to requirement list (#261) · 62b60db7
  Lyu Han authored Aug 18, 2023
  
  62b60db7
17 Aug, 2023 3 commits

[Fix] Implement movmatrix using warp shuffling for CUDA < 11.8 (#267) · f8ed456e
Li Zhang authored Aug 17, 2023

f8ed456e

docs(quantzation): update description (#253) · 903707b5

tpoisonooo authored Aug 17, 2023

* Update quantization.md

* docs(quantization): update description

* docs(README): rename quantization files

903707b5

Support windows platform (#209) · 4c9959f6

Chen Xin authored Aug 17, 2023

* __PRETTY_FUNCTION__

* CASE_K

* uint

* remove not

* HALF_FLT_MAX

* struct init

* port utils

* better build pthread-win32

* port kernels

* port utils/gemm_test

* hide windows header

* port models

* port examples && triton_backend && unittests

* update build readme

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

* fix build

* fix build

* cmake version

* fix typos

* update ci

* port kernels/gemm_s_f16

* update ci

* fix ci

* use cudaStreamSynchronize instead of volatile check

* remove gettimeofday

* remove pthread-win32

* remove dirent.h

* update pre-commit

* update

* remove todo

* fix include

* fix build

* fix build

* fix build ci

* fix github action trigger

* update README

* fix linux-build ci

* remove windows folder

* fix lint

* update readme

4c9959f6

16 Aug, 2023 1 commit
- Adjust dependency of gradio server (#236) · 0d21f366
  AllentDan authored Aug 16, 2023
```
* import if lib directory exists

* only modify app.py
```
  0d21f366