Commits · c3290cadcd8818deeaf9281f0494174308e04d79 · OpenDAS / Lmdeploy

14 Aug, 2023 1 commit

[Feature] Blazing fast W4A16 inference (#202) · c3290cad

Li Zhang authored Aug 14, 2023

* add w4a16

* fix `deploy.py`

* add doc

* add w4a16 kernels

* fuse w1/w3 & bugfixes

* fix typo

* python

* guard sm75/80 features

* add missing header

* refactor

* qkvo bias

* update cost model

* fix lint

* update `deploy.py`

c3290cad

07 Aug, 2023 1 commit

[Refactor] Support multi-session chat (#178) · 4bd0b487

WRH authored Aug 07, 2023

* add some dist utils

* add model utils

* add termio and basicstreamer

* typo

* fix world size

* refactor chat and tested llama1

* add internlm adapter and support stoping criteria

* concat with id for internlm

* update docstring

* update and support llama2

* typo

* move docs to docs

* update docstring of session manager

* update docstring

* update docs

* fix accel none in model

* fix and add test for tensor broadcast

* fix session using typing to check type

* add docstrings and comprehensive condition test

* unit test for dist

* fix session

* split unittests of utils

* typo

* update control flow of accel

* move test model

* remove main in unittest

* remove some log

* remove some comments

4bd0b487

04 Aug, 2023 1 commit

Support serving with gradio without communicating to TIS (#162) · 18c386d9

AllentDan authored Aug 04, 2023



* use local model for webui

* local model for app.py

* lint

* remove print

* add seed

* comments

* fixed seesion_id

* support turbomind batch inference

* update app.py

* lint and docstring

* move webui to serve/gradio

* update doc

* update doc

* update docstring and rmeove print conversition

* log

* Update docs/zh_cn/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* Update docs/en/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* use latest gradio

* fix

* replace partial with InterFace

* use host ip instead of coolie

---------
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

18c386d9

26 Jul, 2023 1 commit
- [Docs] Translate the quantization.md (#166) · 3df43e8c
  Xin Li authored Jul 26, 2023
```
* translate quantization doc

* revise
```
  3df43e8c
23 Jul, 2023 1 commit

Refactor the chat template of supported models using factory pattern (#144) · 7b470f07

lvhan028 authored Jul 23, 2023

* refactor model.py and support baichuan-7b

* remove model_name

* remove hard session_len

* export tokenizer.py to target dir

* remove model_name from client

* remove model_name

* update

* correct throughput equation

* fix session.response

* update serving.md

* update readme

* update according to review comments

* update

* update

* update

* update

7b470f07

17 Jul, 2023 1 commit
- [bugfix] Fix some docs' bug in 'serving' (#109) · 169d8c7f
  Jaylin Lee authored Jul 17, 2023
```
* [bugfix] Fix some docs' bug in 'serving'

* [bugfix] Fix some docs' bug in 'serving'
```
  169d8c7f
14 Jul, 2023 1 commit
- move turbomind.md to docs/en (#118) · 79deb99d
  lvhan028 authored Jul 14, 2023
```
* move turbomind.md to docs/en

* update link

* update link
```
  79deb99d
13 Jul, 2023 1 commit
- Update serving.md (#106) · c0933457
  del-zhenwu authored Jul 13, 2023
  
  c0933457
11 Jul, 2023 2 commits
- docs(serving.md): typo (#92) · 4db08045
  tpoisonooo authored Jul 11, 2023
```
* docs(serving.md): typo

* docs(README): quantization
```
  4db08045
- update contribution.md (#86) · e7d5e062
  q.yao authored Jul 11, 2023
```
* update contrib

* update links
```
  e7d5e062
05 Jul, 2023 1 commit

improve readme (#52) · 3e7b6bfd

lvhan028 authored Jul 05, 2023

* add performance

* use png

* update

* update

* update

* update

* update

3e7b6bfd