- 28 Nov, 2023 1 commit
-
-
q.yao authored
-
- 27 Nov, 2023 2 commits
- 24 Nov, 2023 1 commit
-
-
Lyu Han authored
-
- 23 Nov, 2023 3 commits
- 22 Nov, 2023 1 commit
-
-
Chen Xin authored
* turbomind support export model params * fix overflow * support turbomind.from_pretrained * fix tp * support AutoModel * support load kv qparams * update auto_awq * udpate docstring * export lmdeploy version * update doc * remove download_hf_repo * LmdeployForCausalLM -> LmdeployForCausalLM * refactor turbomind.py * update comment * add bfloat16 convert back * support gradio run_locl load hf * support resuful api server load hf * add docs * support loading previous quantized model * adapt pr 690 * udpate docs * not export turbomind config when quantize a model * check model_name when can not get it from config.json * update readme * remove model_name in auto_awq * update * update * udpate * fix build * absolute import
-
- 21 Nov, 2023 1 commit
-
-
Zaida Zhou authored
-
- 20 Nov, 2023 3 commits
-
-
Lyu Han authored
* update * update config guide * update guide * upate user guide according to review comments
-
Li Zhang authored
* tmp * update * update * optimize for throughput * update * fix eos * clean up * fix serving * fix indexed copy * minor * minor --------- Co-authored-by:lvhan028 <lvhan_028@163.com>
-
Lyu Han authored
* Fix wrong eos_id and bos_id obtained through grpc api * fix according to review comments * update
-
- 19 Nov, 2023 2 commits
- 16 Nov, 2023 1 commit
-
-
whcao authored
* fix load_checkpoint_in_model bug * fix comments * fix comments * fix bugs
-
- 15 Nov, 2023 1 commit
-
-
q.yao authored
* fix * instance for each forward
-
- 14 Nov, 2023 1 commit
-
-
Li Zhang authored
* fix init of finished buf * fix `finished_count`
-
- 13 Nov, 2023 2 commits
- 10 Nov, 2023 2 commits
-
-
Li Zhang authored
* refresh decoder attention kernel * block-level kv cache * `BlockManager` & `SequenceManager` * update * update * update * update * rename * GQA support * fix context length * GQA dispatch * kv8 * tune * async stream cb * nvtx * config parsing * debug * optimize output cost * split-k decoding * minor * truncate `session_len` by available blocks * minor * license * fix * dispatch `cp.async` * fix linking * fix * fix deadlock * guard input length * correct start offset * fix prefill chunking * fix `cache_block_seq_len` param passing * fix `block_size` fmtstr * fix output tokens * fix batch resizing * fix masking of finished sequences * add debug util * free unused block early * add ntk scaling and logn scaling * cmake flags * fix typo * w4a16 for sm75 * fix msvc build * fix msvc build * fix block verification * fix msvc build * use `std::shuffle` * fix lint * fix lint * fix lint * clear incoming buffer * clear finished requests * fix batch initialization * fix typo * fix typo * fix comparison
-
RunningLeon authored
* update reqs * update docs * resolve comments * upgrade pydantic * fix rebase * update doc * update * update * update readme * update * add flash-attn
-
- 09 Nov, 2023 3 commits
- 08 Nov, 2023 3 commits
-
-
RunningLeon authored
* add check env * update issue template' * remove some reqs from check env * resolve comment
-
Chen Xin authored
-
AllentDan authored
* fix benchmark serving computation mistake * fix timestamps computations * remove speed up * no mp * mp seems faster? * remove * update * remove * fix * update * update print log * typo * print fist token latency only stream==True * remove renew_session * update AsyncEngine
-
- 06 Nov, 2023 2 commits
-
-
aisensiy authored
* Use session id from gradio state * use a new session id after reset * rename session id like a state * update comments * reformat files * init session id on block loaded * use auto increased session id * remove session id textbox * apply to api_server and tritonserver * update docstring * add lock for safety --------- Co-authored-by:AllentDan <AllentDan@yeah.net>
-
yunzhongyan0 authored
* FIX: fix stop_session func bug * keep sequence_end = False --------- Co-authored-by:
honglei.yan <honglei.yan@nio.com> Co-authored-by:
AllentDan <AllentDan@yeah.net>
-
- 03 Nov, 2023 6 commits
-
-
pppppM authored
* fix awq * adapt new qwen code * adapt qwen 14b and baichuan2 7b * add docstring * add runtime error for qwen
-
AllentDan authored
-
liukuikun authored
-
Chen Xin authored
* split deploy.py * fix get_cuda_tensor * deploy qwen_awq * fix lint * add docstring * fix * support baichuan/baichuan-awq * parameterizing size_per_head * remove try/except * limit input model_format * add quant_path param * remove old deploy.py * fix path * fix transformer layer range when load bins * fix qwen init * split & save log * relative import * update get_config * WeightFileMgr -> Reader * rename * update * fix init_layer_id * rename llama.py -> meta_llama.py, hf.py -> llama.py * reduce code * update arg description * fix meta llama * manually cleanup meta model params
-
RunningLeon authored
* update * resolve comment
-
Yam(长琴) authored
-
- 01 Nov, 2023 1 commit
-
-
AllentDan authored
* make IPv6 compatible, safe run for coroutine interrupting * instance_id -> session_id and fix api_client.py * update doc * remove useless faq * safe ip mapping * update app.py * WIP completion * completion * update doc * disable interactive mode for /v1/chat/completions * docstring * docstring * refactor gradio * update gradio * udpate * update doc * rename * session_id default -1 * missed two files * add a APIClient * add chat func for APIClient * refine * add concurrent function * sequence_start, sequence_end --> interactive_mode * update doc * comments * doc * better text completion * remove /v1/embeddings * comments * deprecate generate and use /v1/interactive/completions * /v1/interactive/completion -> /v1/chat/interactive * embeddings * rename * remove wrong arg description * docstring * fix * update cli * update doc * strict session_len limit condition * pass model args to api_server
-
- 30 Oct, 2023 1 commit
-
-
Lyu Han authored
-
- 25 Oct, 2023 3 commits
-
-
AllentDan authored
* support inference a batch of prompts * docstring and assert
-
RunningLeon authored
* add * import fire in main * wrap to speed up fire cli * update * update docs * update docs * fix * resolve commennts * resolve confict and add test for cli
-
Lyu Han authored
* add build from docker section * update * install python package * update * update * update
-