Commits · c3290cadcd8818deeaf9281f0494174308e04d79 · OpenDAS / Lmdeploy

14 Aug, 2023 1 commit

[Feature] Blazing fast W4A16 inference (#202) · c3290cad

Li Zhang authored Aug 14, 2023

* add w4a16

* fix `deploy.py`

* add doc

* add w4a16 kernels

* fuse w1/w3 & bugfixes

* fix typo

* python

* guard sm75/80 features

* add missing header

* refactor

* qkvo bias

* update cost model

* fix lint

* update `deploy.py`

c3290cad

23 Jul, 2023 1 commit

Refactor the chat template of supported models using factory pattern (#144) · 7b470f07

lvhan028 authored Jul 23, 2023

* refactor model.py and support baichuan-7b

* remove model_name

* remove hard session_len

* export tokenizer.py to target dir

* remove model_name from client

* remove model_name

* update

* correct throughput equation

* fix session.response

* update serving.md

* update readme

* update according to review comments

* update

* update

* update

* update

7b470f07

17 Jul, 2023 1 commit
- [bugfix] Fix some docs' bug in 'serving' (#109) · 169d8c7f
  Jaylin Lee authored Jul 17, 2023
```
* [bugfix] Fix some docs' bug in 'serving'

* [bugfix] Fix some docs' bug in 'serving'
```
  169d8c7f
13 Jul, 2023 1 commit
- Update serving.md (#106) · c0933457
  del-zhenwu authored Jul 13, 2023
  
  c0933457
11 Jul, 2023 2 commits
- docs(serving.md): typo (#92) · 4db08045
  tpoisonooo authored Jul 11, 2023
```
* docs(serving.md): typo

* docs(README): quantization
```
  4db08045
- update contribution.md (#86) · e7d5e062
  q.yao authored Jul 11, 2023
```
* update contrib

* update links
```
  e7d5e062
05 Jul, 2023 1 commit

improve readme (#52) · 3e7b6bfd

lvhan028 authored Jul 05, 2023

* add performance

* use png

* update

* update

* update

* update

* update

3e7b6bfd