Commits · 903707b5454e1fea599a26dd91f2fafc228df17a · OpenDAS / Lmdeploy

17 Aug, 2023 2 commits

docs(quantzation): update description (#253) · 903707b5

tpoisonooo authored Aug 17, 2023

* Update quantization.md

* docs(quantization): update description

* docs(README): rename quantization files

903707b5

Support windows platform (#209) · 4c9959f6

Chen Xin authored Aug 17, 2023

* __PRETTY_FUNCTION__

* CASE_K

* uint

* remove not

* HALF_FLT_MAX

* struct init

* port utils

* better build pthread-win32

* port kernels

* port utils/gemm_test

* hide windows header

* port models

* port examples && triton_backend && unittests

* update build readme

* fix lint

* fix lint

* fix lint

* fix lint

* fix lint

* fix build

* fix build

* cmake version

* fix typos

* update ci

* port kernels/gemm_s_f16

* update ci

* fix ci

* use cudaStreamSynchronize instead of volatile check

* remove gettimeofday

* remove pthread-win32

* remove dirent.h

* update pre-commit

* update

* remove todo

* fix include

* fix build

* fix build

* fix build ci

* fix github action trigger

* update README

* fix linux-build ci

* remove windows folder

* fix lint

* update readme

4c9959f6

16 Aug, 2023 3 commits
- Adjust dependency of gradio server (#236) · 0d21f366
  AllentDan authored Aug 16, 2023
```
* import if lib directory exists

* only modify app.py
```
  0d21f366
- remove chat template (#252) · f06db80d
  Lyu Han authored Aug 16, 2023
  
  f06db80d
- [Feature] Profiling tool for huggingface and deepspeed models (#161) · e50387cc
  WRH authored Aug 16, 2023
```
* initial

* add docs

* noqa on docstring

* support no streamer

* typo

* add accel in model name

* fix lint

* fix CSVWRitter typo

* typo
```
  e50387cc
15 Aug, 2023 2 commits
- Remove specified version in user guide (#241) · e68a1d00
  Lyu Han authored Aug 15, 2023
  
  e68a1d00
- Fix wrong RPATH using the absolute path instead of relative one (#239) · 271a19fe
  Chen Xin authored Aug 15, 2023
  
  271a19fe
14 Aug, 2023 7 commits
- Bump version to v0.0.4 (#231) · 8cdcb2a9
  Lyu Han authored Aug 14, 2023
  
  8cdcb2a9
- Check-in user guide for w4a16 LLM deployment (#224) · 8e8629de
  Lyu Han authored Aug 14, 2023
```
* tmp

* update

* update

* update

* update

* update

* remove

* update

* update
```
  8e8629de
- Fix TIS client got-no-space-result side effect brought by PR #197 (#222) · 68296844
  Lyu Han authored Aug 14, 2023
```
* rollback

* rollback chatbot.py
```
  68296844
- [Docs] Update W4A16 News (#227) · af517a4a
  pppppM authored Aug 14, 2023
```
* update news and add supported models

* fix typo

* add ampere note

* update supported models

* replace icon with yes or no

* avoid ambiguity

* fix typo
```
  af517a4a
- fix auto_awq readme (#228) · 43f75f75
  AllentDan authored Aug 14, 2023
```
* fix auto_awq readme

* hide w_sym option
```
  43f75f75
- feat(quantization): kv cache use asymmetric (#218) · 902a3e16
  tpoisonooo authored Aug 14, 2023
```
* feat(quantization): kv cache use asymmetric
```
  902a3e16
- [Feature] Blazing fast W4A16 inference (#202) · c3290cad
  Li Zhang authored Aug 14, 2023
```
* add w4a16

* fix `deploy.py`

* add doc

* add w4a16 kernels

* fuse w1/w3 & bugfixes

* fix typo

* python

* guard sm75/80 features

* add missing header

* refactor

* qkvo bias

* update cost model

* fix lint

* update `deploy.py`
```
  c3290cad
11 Aug, 2023 1 commit

[Feature] Support AWQ (#108) · d3dbe179

pppppM authored Aug 11, 2023

* support kv cache offload

* add dataloader docstring

* complete gitignore

* refactor collect mod fn

* add calibration

* fix lint

* add observers and quantizers

* fix lints

* add global available mixin

* fix lints

* split batch inference

* support smoothquant and awq

* update export kv scales

* fix lints

* fix some bugs

* update weight only usage

* update usage

* auto mapping and support smooth internlm

* trust remote code

* fix num head key error

* fix bias error

* align shape and pack order with llm-awq

* modified according to LZHgrla's comments.

* update gitignore

* fix kv qparams export error

* update usage

* decouple calibrate and awq

* update docstrings

* update api name

* update readme

* update readme

* update readme

* update readme

* update kv_qparams and readme

* fix typos

d3dbe179

10 Aug, 2023 1 commit
- Add release note template (#211) · 0d9c6c9d
  Lyu Han authored Aug 10, 2023
```
* add release note template

* add 'impovement' label to the excluding list
```
  0d9c6c9d
07 Aug, 2023 5 commits

[Refactor] Support multi-session chat (#178) · 4bd0b487

WRH authored Aug 07, 2023

* add some dist utils

* add model utils

* add termio and basicstreamer

* typo

* fix world size

* refactor chat and tested llama1

* add internlm adapter and support stoping criteria

* concat with id for internlm

* update docstring

* update and support llama2

* typo

* move docs to docs

* update docstring of session manager

* update docstring

* update docs

* fix accel none in model

* fix and add test for tensor broadcast

* fix session using typing to check type

* add docstrings and comprehensive condition test

* unit test for dist

* fix session

* split unittests of utils

* typo

* update control flow of accel

* move test model

* remove main in unittest

* remove some log

* remove some comments

4bd0b487

bump version to v0.0.3 (#205) · c80f3e49
lvhan028 authored Aug 07, 2023

c80f3e49
Add non-stream inference api for chatbot (#200) · 3de0dbb6
lvhan028 authored Aug 07, 2023
```
* add non-stream inference api for chatbot

* update according to reviewer's comments
```
3de0dbb6
[Feature] Add script to split HuggingFace model to the smallest sharded checkpoints (#199) · b7e7e668
LZHgrla authored Aug 07, 2023
```
* add get_small_sharded_hf.py

* fix pre-commit
```
b7e7e668
Improve postprocessing in TIS serving by applying Incremental de-tokenizing (#197) · 0ed1e4d4
lvhan028 authored Aug 07, 2023
```
* change to incremental decoding

* update
```
0ed1e4d4

04 Aug, 2023 1 commit

Support serving with gradio without communicating to TIS (#162) · 18c386d9

AllentDan authored Aug 04, 2023



* use local model for webui

* local model for app.py

* lint

* remove print

* add seed

* comments

* fixed seesion_id

* support turbomind batch inference

* update app.py

* lint and docstring

* move webui to serve/gradio

* update doc

* update doc

* update docstring and rmeove print conversition

* log

* Update docs/zh_cn/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* Update docs/en/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* use latest gradio

* fix

* replace partial with InterFace

* use host ip instead of coolie

---------
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

18c386d9

03 Aug, 2023 3 commits
- Move lmdeploy/turbomind/utils.py to lmdeploy/utils.py (#191) · 7a2128be
  lvhan028 authored Aug 03, 2023
  
  7a2128be
- Fix build test error and move turbmind csrc test cases to `tests/csrc` (#188) · 44a85546
  lvhan028 authored Aug 03, 2023
```
* fix build tests failure

* move src test cases to tests/csrc
```
  44a85546
- [Docs] Translate turbomind.md into Chinese (#173) · 5545bbc5
  Xin Li authored Aug 03, 2023
```
* translate turbomind

* keep persistent batching

* revised

* revise
```
  5545bbc5
01 Aug, 2023 1 commit
- Fix typo in README.md (#187) · 8f80cb5f
  tpoisonooo authored Aug 01, 2023
  
  8f80cb5f
31 Jul, 2023 4 commits
- Support Runtime tensor parallelism (#158) · 4767b04d
  q.yao authored Jul 31, 2023
```
* works on interlm and vicuna

* support GQA

* remove comment

* update readme, add logger, default tp=1

* remove log
```
  4767b04d
- [Fix] Remove unused code to reduce binary size (#181) · 981a4610
  Li Zhang authored Jul 31, 2023
```
* clean-up

* fix lint

* fix lint
```
  981a4610
- Add issue and PR templates (#184) · 83697422
  lvhan028 authored Jul 31, 2023
  
  83697422
- Fix typo in profile_serving.py (#183) · 09c624ce
  del-zhenwu authored Jul 31, 2023
  
  09c624ce
28 Jul, 2023 1 commit

bump version to v0.0.2 (#177) · 7e0b75bb

lvhan028 authored Jul 28, 2023

* bump version to v0.0.2

* fix command

* update installation and inference section

7e0b75bb

27 Jul, 2023 4 commits

Add pypi ci (#170) · 859658eb
Chen Xin authored Jul 27, 2023
```
* add pypi ci

* fix build
```
859658eb
[Doc] add Twitter link (#175) · c1c1353d
vansin authored Jul 27, 2023
```
* Doc: add Twitter link

* Doc: add a space
```
c1c1353d
add model_name param for chatbot (#174) · 7bc8d171
MaxMatthew authored Jul 27, 2023

7bc8d171

Add manylinux builder (#164) · b9004712

Chen Xin authored Jul 27, 2023



* update builder

* remove root permission

* update readme

* update setup.py

* add install cuda 12.1 script

* use generate.sh

* add nccl to install_requires

* update README.md

* fix lint

* update setup.py

---------
Co-authored-by: chenxin <chenxin@pjlab.org.cn>

b9004712

26 Jul, 2023 3 commits
- [Docs] Translate the quantization.md (#166) · 3df43e8c
  Xin Li authored Jul 26, 2023
```
* translate quantization doc

* revise
```
  3df43e8c
- docs(README): disable ECC (#159) · 63bd5916
  tpoisonooo authored Jul 26, 2023
```
* Update README_zh-CN.md

* Update README.md

* Update README_zh-CN.md

* Update README.md

* Update README_zh-CN.md
```
  63bd5916
- Add triton_models to whl package (#163) · e7bc11b4
  Chen Xin authored Jul 26, 2023
```
* defer symlink

* fix lint
```
  e7bc11b4
25 Jul, 2023 2 commits
- support fmha gqa (#160) · 5ed6bb59
  q.yao authored Jul 25, 2023
```
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>
```
  5ed6bb59
- fix getting package root path error in python3.9 (#157) · 5203c850
  lvhan028 authored Jul 25, 2023
  
  5203c850