- 28 May, 2024 1 commit
-
-
gaoqiong authored
-
- 22 Mar, 2024 1 commit
-
-
zhouxiang authored
-
- 22 Nov, 2023 1 commit
-
-
Chen Xin authored
* turbomind support export model params * fix overflow * support turbomind.from_pretrained * fix tp * support AutoModel * support load kv qparams * update auto_awq * udpate docstring * export lmdeploy version * update doc * remove download_hf_repo * LmdeployForCausalLM -> LmdeployForCausalLM * refactor turbomind.py * update comment * add bfloat16 convert back * support gradio run_locl load hf * support resuful api server load hf * add docs * support loading previous quantized model * adapt pr 690 * udpate docs * not export turbomind config when quantize a model * check model_name when can not get it from config.json * update readme * remove model_name in auto_awq * update * update * udpate * fix build * absolute import
-
- 10 Nov, 2023 1 commit
-
-
Li Zhang authored
* refresh decoder attention kernel * block-level kv cache * `BlockManager` & `SequenceManager` * update * update * update * update * rename * GQA support * fix context length * GQA dispatch * kv8 * tune * async stream cb * nvtx * config parsing * debug * optimize output cost * split-k decoding * minor * truncate `session_len` by available blocks * minor * license * fix * dispatch `cp.async` * fix linking * fix * fix deadlock * guard input length * correct start offset * fix prefill chunking * fix `cache_block_seq_len` param passing * fix `block_size` fmtstr * fix output tokens * fix batch resizing * fix masking of finished sequences * add debug util * free unused block early * add ntk scaling and logn scaling * cmake flags * fix typo * w4a16 for sm75 * fix msvc build * fix msvc build * fix block verification * fix msvc build * use `std::shuffle` * fix lint * fix lint * fix lint * clear incoming buffer * clear finished requests * fix batch initialization * fix typo * fix typo * fix comparison
-
- 01 Sep, 2023 1 commit
-
-
Chen Xin authored
* pack llama_gemm * update CMakeLists.txt * remove candidate * update MANIFEST.in
-
- 24 Aug, 2023 1 commit
-
-
WRH authored
* support decode * unit test and benckmark and improve * remove some drafts * enable numerical test * minor * add some benchmark data * add more output * update interface * remove debugs * format * update docstring * remove print and add benchmark results * use logits & add main * fix rb * dump large * update test * update test decode * add decimal
-
- 21 Aug, 2023 1 commit
-
-
RunningLeon authored
* add readthedocs configs * update readme * fix link * update * remove turbomind in api * update * fix comment and remove api
-
- 14 Aug, 2023 1 commit
-
-
Lyu Han authored
-
- 11 Aug, 2023 1 commit
-
-
pppppM authored
* support kv cache offload * add dataloader docstring * complete gitignore * refactor collect mod fn * add calibration * fix lint * add observers and quantizers * fix lints * add global available mixin * fix lints * split batch inference * support smoothquant and awq * update export kv scales * fix lints * fix some bugs * update weight only usage * update usage * auto mapping and support smooth internlm * trust remote code * fix num head key error * fix bias error * align shape and pack order with llm-awq * modified according to LZHgrla's comments. * update gitignore * fix kv qparams export error * update usage * decouple calibrate and awq * update docstrings * update api name * update readme * update readme * update readme * update readme * update kv_qparams and readme * fix typos
-
- 27 Jul, 2023 1 commit
-
-
Chen Xin authored
* update builder * remove root permission * update readme * update setup.py * add install cuda 12.1 script * use generate.sh * add nccl to install_requires * update README.md * fix lint * update setup.py --------- Co-authored-by:chenxin <chenxin@pjlab.org.cn>
-
- 06 Jul, 2023 3 commits
-
-
WRH authored
* draft torch client * deal with space of tokenizer * support tensor parallel * fix * fix * move folder * move instruction to readme * move to torch/ * rename client to chat * very bad response * stash * rename streamer * support internlm * change default args * remove test * improve instructions * remove module docstring * decrease header level of torch model
-
tpoisonooo authored
* docs(README): fix script
-
tpoisonooo authored
-
- 05 Jul, 2023 2 commits
-
-
pppppM authored
* add cal qparams * support offload inference * add collect funtions (mod,weight) * stats kv scales * update init * add user guide * fix hints * fix comments & support turbomind format * update user guide * fix slice kv cache error & support pileval dataset (used in llm-awq) * fix wrong num heads slice * update default dataset * fix conflict * fix hints * fix hints * add gitignore
-
q.yao authored
* wip * wip * example finish * fix include and namespace * wtf * install lib * batchize * update cmake install * multithread * fix comment * fix * add mmengine * bind llamamodel --------- Co-authored-by:grimoire <yaoqian@pjlab.org.cn>
-
- 04 Jul, 2023 1 commit
-
-
lvhan028 authored
* check-in script for tokenizing a file * use max_input_len
-
- 20 Jun, 2023 2 commits
-
-
Li Zhang authored
* add ft code * gitignore * fix lint * revert fmha
-
lvhan028 authored
* add scripts for deploying llama family models via fastertransformer * fix * fix * set symlinks True when copying triton models templates * pack model repository for triton inference server * add exception * fix * update config.pbtxt and launching scripts
-
- 18 Jun, 2023 1 commit
-
-
lvhan028 authored
-