Commits · 8efb921068ef426759453da670289b8564989029 · OpenDAS / Lmdeploy

28 May, 2024 1 commit
- 增加so相关文件 · 03821cd7
  gaoqiong authored May 28, 2024
  
  03821cd7
22 Mar, 2024 1 commit
- 同步0.2.6代码 · d7117b95
  zhouxiang authored Mar 22, 2024
  
  d7117b95
22 Nov, 2023 1 commit

Support loading hf model directly (#685) · 6b00f623

Chen Xin authored Nov 22, 2023

* turbomind support export model params

* fix overflow

* support turbomind.from_pretrained

* fix tp

* support AutoModel

* support load kv qparams

* update auto_awq

* udpate docstring

* export lmdeploy version

* update doc

* remove download_hf_repo

* LmdeployForCausalLM -> LmdeployForCausalLM

* refactor turbomind.py

* update comment

* add bfloat16 convert back

* support gradio run_locl load hf

* support resuful api server load hf

* add docs

* support loading previous quantized model

* adapt pr 690

* udpate docs

* not export turbomind config when quantize a model

* check model_name when can not get it from config.json

* update readme

* remove model_name in auto_awq

* update

* update

* udpate

* fix build

* absolute import

6b00f623

10 Nov, 2023 1 commit

TurboMind 2 (#590) · ab1767cf

Li Zhang authored Nov 10, 2023

* refresh decoder attention kernel

* block-level kv cache

* `BlockManager` & `SequenceManager`

* update

* update

* update

* update

* rename

* GQA support

* fix context length

* GQA dispatch

* kv8

* tune

* async stream cb

* nvtx

* config parsing

* debug

* optimize output cost

* split-k decoding

* minor

* truncate `session_len` by available blocks

* minor

* license

* fix

* dispatch `cp.async`

* fix linking

* fix

* fix deadlock

* guard input length

* correct start offset

* fix prefill chunking

* fix `cache_block_seq_len` param passing

* fix `block_size` fmtstr

* fix output tokens

* fix batch resizing

* fix masking of finished sequences

* add debug util

* free unused block early

* add ntk scaling and logn scaling

* cmake flags

* fix typo

* w4a16 for sm75

* fix msvc build

* fix msvc build

* fix block verification

* fix msvc build

* use `std::shuffle`

* fix lint

* fix lint

* fix lint

* clear incoming buffer

* clear finished requests

* fix batch initialization

* fix typo

* fix typo

* fix comparison

ab1767cf

01 Sep, 2023 1 commit
- Package 'bin/llama_gemm' to wheel (#320) · 22e8b2ca
  Chen Xin authored Sep 01, 2023
```
* pack llama_gemm

* update CMakeLists.txt

* remove candidate

* update MANIFEST.in
```
  22e8b2ca
24 Aug, 2023 1 commit

[Feature] Support decode with DP in pytorch (#193) · 81f29837

WRH authored Aug 24, 2023

* support decode

* unit test and benckmark and improve

* remove some drafts

* enable numerical test

* minor

* add some benchmark data

* add more output

* update interface

* remove debugs

* format

* update docstring

* remove print and add benchmark results

* use logits & add main

* fix rb

* dump large

* update test

* update test decode

* add decimal

81f29837

21 Aug, 2023 1 commit

add readthedocs (#208) · c238f1cd

RunningLeon authored Aug 21, 2023

* add readthedocs configs

* update readme

* fix link

* update

* remove turbomind in api

* update

* fix comment and remove api

c238f1cd

14 Aug, 2023 1 commit
- Bump version to v0.0.4 (#231) · 8cdcb2a9
  Lyu Han authored Aug 14, 2023
  
  8cdcb2a9
11 Aug, 2023 1 commit

[Feature] Support AWQ (#108) · d3dbe179

pppppM authored Aug 11, 2023

* support kv cache offload

* add dataloader docstring

* complete gitignore

* refactor collect mod fn

* add calibration

* fix lint

* add observers and quantizers

* fix lints

* add global available mixin

* fix lints

* split batch inference

* support smoothquant and awq

* update export kv scales

* fix lints

* fix some bugs

* update weight only usage

* update usage

* auto mapping and support smooth internlm

* trust remote code

* fix num head key error

* fix bias error

* align shape and pack order with llm-awq

* modified according to LZHgrla's comments.

* update gitignore

* fix kv qparams export error

* update usage

* decouple calibrate and awq

* update docstrings

* update api name

* update readme

* update readme

* update readme

* update readme

* update kv_qparams and readme

* fix typos

d3dbe179

27 Jul, 2023 1 commit

Add manylinux builder (#164) · b9004712

Chen Xin authored Jul 27, 2023



* update builder

* remove root permission

* update readme

* update setup.py

* add install cuda 12.1 script

* use generate.sh

* add nccl to install_requires

* update README.md

* fix lint

* update setup.py

---------
Co-authored-by: chenxin <chenxin@pjlab.org.cn>

b9004712

06 Jul, 2023 3 commits

[Feature] Add a torch client (#19) · 009075d8

WRH authored Jul 06, 2023

* draft torch client

* deal with space of tokenizer

* support tensor parallel

* fix

* fix

* move folder

* move instruction to readme

* move to torch/

* rename client to chat

* very bad response

* stash

* rename streamer

* support internlm

* change default args

* remove test

* improve instructions

* remove module docstring

* decrease header level of torch model

009075d8

Update .gitignore (#70) · b2393467
tpoisonooo authored Jul 06, 2023
```
* docs(README): fix script
```
b2393467
fix(project): interlm run error (#69) · 22d403f5
tpoisonooo authored Jul 06, 2023

22d403f5

05 Jul, 2023 2 commits

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d

Python ffi (#34) · 4fd6e710

q.yao authored Jul 05, 2023



* wip

* wip

* example finish

* fix include and namespace

* wtf

* install lib

* batchize

* update cmake install

* multithread

* fix comment

* fix

* add mmengine

* bind llamamodel

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

4fd6e710

04 Jul, 2023 1 commit
- support 'input_tokens' in triton_example (#49) · 62e0fa9a
  lvhan028 authored Jul 04, 2023
```
* check-in script for tokenizing a file

* use max_input_len
```
  62e0fa9a
20 Jun, 2023 2 commits

check-in fastertransformer (#7) · 9efcac38
Li Zhang authored Jun 20, 2023
```
* add ft code

* gitignore

* fix lint

* revert fmha
```
9efcac38

update scripts for deploying llama family model to fastertransformer triton models () · 2bf481fb

lvhan028 authored Jun 20, 2023

* add scripts for deploying llama family models via fastertransformer

* fix

* fix

* set symlinks True when copying triton models templates

* pack model repository for triton inference server

* add exception

* fix

* update config.pbtxt and launching scripts

2bf481fb

18 Jun, 2023 1 commit
- check-in license (#1 ) · a75c0a47
  lvhan028 authored Jun 18, 2023
  
  a75c0a47