Commits · 6b00f6239012b2bd6f1450a44107fe8665906451 · OpenDAS / Lmdeploy

22 Nov, 2023 1 commit

Support loading hf model directly (#685) · 6b00f623

Chen Xin authored Nov 22, 2023

* turbomind support export model params

* fix overflow

* support turbomind.from_pretrained

* fix tp

* support AutoModel

* support load kv qparams

* update auto_awq

* udpate docstring

* export lmdeploy version

* update doc

* remove download_hf_repo

* LmdeployForCausalLM -> LmdeployForCausalLM

* refactor turbomind.py

* update comment

* add bfloat16 convert back

* support gradio run_locl load hf

* support resuful api server load hf

* add docs

* support loading previous quantized model

* adapt pr 690

* udpate docs

* not export turbomind config when quantize a model

* check model_name when can not get it from config.json

* update readme

* remove model_name in auto_awq

* update

* update

* udpate

* fix build

* absolute import

6b00f623

20 Nov, 2023 1 commit

Check-in user guide about turbomind config (#680) · 73386e21

Lyu Han authored Nov 20, 2023

* update

* update config guide

* update guide

* upate user guide according to review comments

73386e21

13 Nov, 2023 1 commit

[Docs] Update Supported Matrix (#679) · e641dd86

pppppM authored Nov 13, 2023

* update supported matrix

* change the default shard size when saving quantized weights

* baichuan2 kv8

e641dd86

10 Nov, 2023 1 commit

Add extra_requires to reduce dependencies (#580) · 06125966

RunningLeon authored Nov 10, 2023

* update reqs

* update docs

* resolve comments

* upgrade pydantic

* fix rebase

* update doc

* update

* update

* update readme

* update

* add flash-attn

06125966

03 Nov, 2023 1 commit
- add cli to list the supported model names (#639) · 1bbc6e05
  RunningLeon authored Nov 03, 2023
```
* update

* resolve comment
```
  1bbc6e05
01 Nov, 2023 1 commit

Improve api_server and webui usage (#544) · 373bd013

AllentDan authored Nov 01, 2023

* make IPv6 compatible, safe run for coroutine interrupting

* instance_id -> session_id and fix api_client.py

* update doc

* remove useless faq

* safe ip mapping

* update app.py

* WIP completion

* completion

* update doc

* disable interactive mode for /v1/chat/completions

* docstring

* docstring

* refactor gradio

* update gradio

* udpate

* update doc

* rename

* session_id default -1

* missed two files

* add a APIClient

* add chat func for APIClient

* refine

* add concurrent function

* sequence_start, sequence_end --> interactive_mode

* update doc

* comments

* doc

* better text completion

* remove /v1/embeddings

* comments

* deprecate generate and use /v1/interactive/completions

* /v1/interactive/completion -> /v1/chat/interactive

* embeddings

* rename

* remove wrong arg description

* docstring

* fix

* update cli

* update doc

* strict session_len limit condition

* pass model args to api_server

373bd013

25 Oct, 2023 1 commit

Add more user-friendly CLI (#541) · 169d5169

RunningLeon authored Oct 25, 2023

* add

* import fire in main

* wrap to speed up fire cli

* update

* update docs

* update docs

* fix

* resolve commennts

* resolve confict and add test for cli

169d5169

19 Oct, 2023 1 commit
- add solar chat template (#576) · 70a5c63a
  AllentDan authored Oct 19, 2023
  
  70a5c63a
12 Oct, 2023 2 commits
- support deploy qwen-14b-chat (#482) · b21239a8
  Chen Xin authored Oct 12, 2023
```
* support deploy qwen-14b-chat

* update README

* load safetensors first
```
  b21239a8
- update huggingface internlm-chat-7b model url (#546) · 27e12477
  AllentDan authored Oct 12, 2023
  
  27e12477
25 Sep, 2023 1 commit
- Fix typo in README.md (#462) · 71945001
  Ikko Eltociear Ashimine authored Sep 25, 2023
```
quantilized -> quantized
```
  71945001
20 Sep, 2023 1 commit

Support InternLM 20B (#440) · df7955de

Lyu Han authored Sep 20, 2023



* better profiler

* wait for releasing mem

* remove fire

* remove support for multiple model benchmark

* comments

* support actual seqlen

* change chat template

* update

* fix ut

* int->size_t

* output more details

* correct tp

* rollback

* update

* update readme

* add 'internlm-chat' as the default tag for internlm chat models

* rollback tokenizer

---------
Co-authored-by: AllentDan <AllentDan@yeah.net>
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

df7955de

11 Sep, 2023 1 commit

Support codellama (#359) · 65c662f9

Lyu Han authored Sep 11, 2023

* tmp

* add demo for codellama inference

* update

* update

* update

* update codellama.md

* export rope_theta

* update

* update doc

* fix client.py

* define SamplingParam

* rollback 'end'

* rotary_emb_base to rotary_embedding_base

* change to baichuan2-7b

65c662f9

08 Sep, 2023 1 commit

Support baichuan2-chat chat template (#378) · 55764e0b

WRH authored Sep 08, 2023



* support baichuan2-chat

* update args from generation config

* update deploy.py

* update readme

* tested with tp

* step-1 when last id is eos

* add news

---------
Co-authored-by: chenxin <chenxin@pjlab.org.cn>

55764e0b

06 Sep, 2023 1 commit
- Update logo (#372) · e4701226
  Lyu Han authored Sep 06, 2023
  
  e4701226
05 Sep, 2023 1 commit
- [Doc] Fix quantization docs link (#367) · 683c3fe9
  Zhihao Lin authored Sep 05, 2023
  
  683c3fe9
29 Aug, 2023 2 commits

Add flashattention2 (#196) · 452822a4

q.yao authored Aug 29, 2023



* first

* fix causal mask

* disable flash attention2 on sm70

* fix 2

* update readme

* clang-format

* disable ft2 on windows

* fix lint

* fix build

* fix build

* fix long kv seq

* fix lint

* sync copy output

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>
Co-authored-by: irexyc <irexyc@gmail.com>

452822a4

fix(kvint8): update doc (#315) · a48e2d27

tpoisonooo authored Aug 29, 2023



* fix(kvint8): update doc

* style(lmdeploy): format

* style(kv_qparams.py): linting

* fix lint

* Update kv_int8.md

* Update kv_int8.md

---------
Co-authored-by: AllentDan <AllentDan@yeah.net>

a48e2d27

24 Aug, 2023 2 commits

Enable the Gradio server to call inference services through the RESTful API (#287) · 4279d8ca

AllentDan authored Aug 24, 2023



* app use async engine

* add stop logic

* app update cancel

* app support restful-api

* update doc and use the right model name

* set doc url root

* add comments

* add an example

* renew_session

* update readme.md

* resolve comments

* Update restful_api.md

* Update restful_api.md

* Update restful_api.md

---------
Co-authored-by: tpoisonooo <khj.application@aliyun.com>

4279d8ca

[Fix] Fix llama2 70b & qwen quantization error (#273) · d5cb0be2
pppppM authored Aug 24, 2023
```
* fix llama2 70b

* fix qwen quantization

* remove pdb

* add faq
```
d5cb0be2

21 Aug, 2023 2 commits
- docs(quantization): update description (#272) · f44ef17c
  tpoisonooo authored Aug 21, 2023
  
  f44ef17c
- add readthedocs (#208) · c238f1cd
  RunningLeon authored Aug 21, 2023
```
* add readthedocs configs

* update readme

* fix link

* update

* remove turbomind in api

* update

* fix comment and remove api
```
  c238f1cd
18 Aug, 2023 1 commit

[Feature] Support Qwen-7B, dynamic NTK scaling and logN scaling in turbomind (#230) · 4a60b45d

Li Zhang authored Aug 18, 2023

* qwen support

* dynamic ntk & logn attn

* fix ntk & add chat template

* fix ntk scaling & stop words

* fix lint

* add tiktoken to requirements.txt

* fix tokenizer, set model format automatically

* update model.py

* update readme

* fix lint

4a60b45d

17 Aug, 2023 2 commits

docs(quantzation): update description (#253) · 903707b5

tpoisonooo authored Aug 17, 2023

* Update quantization.md

* docs(quantization): update description

* docs(README): rename quantization files

903707b5

Support windows platform (#209) · 4c9959f6

Chen Xin authored Aug 17, 2023

* __PRETTY_FUNCTION__

* CASE_K

* uint

* remove not

* HALF_FLT_MAX

* struct init

* port utils

* better build pthread-win32

* port kernels

* port utils/gemm_test

* hide windows header

* port models

* port examples && triton_backend && unittests

* update build readme

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

* fix build

* fix build

* cmake version

* fix typos

* update ci

* port kernels/gemm_s_f16

* update ci

* fix ci

* use cudaStreamSynchronize instead of volatile check

* remove gettimeofday

* remove pthread-win32

* remove dirent.h

* update pre-commit

* update

* remove todo

* fix include

* fix build

* fix build

* fix build ci

* fix github action trigger

* update README

* fix linux-build ci

* remove windows folder

* fix lint

* update readme

4c9959f6

14 Aug, 2023 3 commits

Check-in user guide for w4a16 LLM deployment (#224) · 8e8629de
Lyu Han authored Aug 14, 2023
```
* tmp

* update

* update

* update

* update

* update

* remove

* update

* update
```
8e8629de

[Docs] Update W4A16 News (#227) · af517a4a

pppppM authored Aug 14, 2023

* update news and add supported models

* fix typo

* add ampere note

* update supported models

* replace icon with yes or no

* avoid ambiguity

* fix typo

af517a4a

fix auto_awq readme (#228) · 43f75f75
AllentDan authored Aug 14, 2023
```
* fix auto_awq readme

* hide w_sym option
```
43f75f75

11 Aug, 2023 1 commit

[Feature] Support AWQ (#108) · d3dbe179

pppppM authored Aug 11, 2023

* support kv cache offload

* add dataloader docstring

* complete gitignore

* refactor collect mod fn

* add calibration

* fix lint

* add observers and quantizers

* fix lints

* add global available mixin

* fix lints

* split batch inference

* support smoothquant and awq

* update export kv scales

* fix lints

* fix some bugs

* update weight only usage

* update usage

* auto mapping and support smooth internlm

* trust remote code

* fix num head key error

* fix bias error

* align shape and pack order with llm-awq

* modified according to LZHgrla's comments.

* update gitignore

* fix kv qparams export error

* update usage

* decouple calibrate and awq

* update docstrings

* update api name

* update readme

* update readme

* update readme

* update readme

* update kv_qparams and readme

* fix typos

d3dbe179

07 Aug, 2023 1 commit

[Refactor] Support multi-session chat (#178) · 4bd0b487

WRH authored Aug 07, 2023

* add some dist utils

* add model utils

* add termio and basicstreamer

* typo

* fix world size

* refactor chat and tested llama1

* add internlm adapter and support stoping criteria

* concat with id for internlm

* update docstring

* update and support llama2

* typo

* move docs to docs

* update docstring of session manager

* update docstring

* update docs

* fix accel none in model

* fix and add test for tensor broadcast

* fix session using typing to check type

* add docstrings and comprehensive condition test

* unit test for dist

* fix session

* split unittests of utils

* typo

* update control flow of accel

* move test model

* remove main in unittest

* remove some log

* remove some comments

4bd0b487

04 Aug, 2023 1 commit

Support serving with gradio without communicating to TIS (#162) · 18c386d9

AllentDan authored Aug 04, 2023



* use local model for webui

* local model for app.py

* lint

* remove print

* add seed

* comments

* fixed seesion_id

* support turbomind batch inference

* update app.py

* lint and docstring

* move webui to serve/gradio

* update doc

* update doc

* update docstring and rmeove print conversition

* log

* Update docs/zh_cn/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* Update docs/en/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* use latest gradio

* fix

* replace partial with InterFace

* use host ip instead of coolie

---------
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

18c386d9

01 Aug, 2023 1 commit
- Fix typo in README.md (#187) · 8f80cb5f
  tpoisonooo authored Aug 01, 2023
  
  8f80cb5f
31 Jul, 2023 1 commit

Support Runtime tensor parallelism (#158) · 4767b04d

q.yao authored Jul 31, 2023

* works on interlm and vicuna

* support GQA

* remove comment

* update readme, add logger, default tp=1

* remove log

4767b04d

28 Jul, 2023 1 commit

bump version to v0.0.2 (#177) · 7e0b75bb

lvhan028 authored Jul 28, 2023

* bump version to v0.0.2

* fix command

* update installation and inference section

7e0b75bb

27 Jul, 2023 1 commit
- [Doc] add Twitter link (#175) · c1c1353d
  vansin authored Jul 27, 2023
```
* Doc: add Twitter link

* Doc: add a space
```
  c1c1353d
26 Jul, 2023 2 commits
- [Docs] Translate the quantization.md (#166) · 3df43e8c
  Xin Li authored Jul 26, 2023
```
* translate quantization doc

* revise
```
  3df43e8c
- docs(README): disable ECC (#159) · 63bd5916
  tpoisonooo authored Jul 26, 2023
```
* Update README_zh-CN.md

* Update README.md

* Update README_zh-CN.md

* Update README.md

* Update README_zh-CN.md
```
  63bd5916
24 Jul, 2023 1 commit
- checkin benchmark on real conversation data (#156) · 0bd1fa40
  lvhan028 authored Jul 24, 2023
```
* checkin benchmark on real conversation data

* change resolution

* update
```
  0bd1fa40
23 Jul, 2023 1 commit

Refactor the chat template of supported models using factory pattern (#144) · 7b470f07

lvhan028 authored Jul 23, 2023

* refactor model.py and support baichuan-7b

* remove model_name

* remove hard session_len

* export tokenizer.py to target dir

* remove model_name from client

* remove model_name

* update

* correct throughput equation

* fix session.response

* update serving.md

* update readme

* update according to review comments

* update

* update

* update

* update

7b470f07

21 Jul, 2023 1 commit

[Feature] Support Llama-2 with GQA (#147) · f07b697b

Li Zhang authored Jul 21, 2023

* add GQA for llama2

* fix model conversion

* fix lint & remove dev log

* update news

* minor

* fix allocation size

* fix split_dim for w_qkv.bias

f07b697b