- 12 Dec, 2023 1 commit
-
-
Lyu Han authored
* simplify the header of the benchmark table * miss comma * fix lint
-
- 06 Dec, 2023 1 commit
-
-
Lyu Han authored
* update test scripts for models with different sizes * update * only test after tunning gemm * chmod +x * fix typo * benchmark on a100 * fix typo * fix typo * per-token latency percentile in profile_throughput * fix * fix * rename * make the script accept parameters * minor fix * indent * reformat table * change to 3000 * minor fix
-
- 05 Dec, 2023 1 commit
-
-
Chen Xin authored
* add cuda12-whl-release ci * enable environment * test py310-311 windows wheel * fix py310, py311 setup.py error on windows * fix lint
-
- 04 Dec, 2023 1 commit
-
-
Lyu Han authored
* minor fix in the profile scripts and docs * miss arguments * typo * fix lint * update
-
- 29 Nov, 2023 3 commits
-
-
Lyu Han authored
* user guide of benchmark generation * update benchmark generation guide * update profiling throughput guide * update profiling api_server guide * rename file names * update profile tis user guide * update * fix according to review comments * update * update according to review comments * updaste * add an example * update
-
Lyu Han authored
* update profile scripts * add top_p, top_k and temperature as input arguments * fix input_ids * update profile_throughput * update profile_restful_api * update profile_serving * update * update * add progress bar * remove TODO comments * update * remove useless profile_* argument * remove log level * change concurrency default value to 64 * update restful_api.md * update according to review comments * fix docstring
-
tpoisonooo authored
* feat(build): enable ninja and lld * fix(.github): add ninja installation * fix(CI): remove dimsize=256 * fix(CI): add option for generate.sh * fix(docs): update
-
- 27 Nov, 2023 1 commit
-
-
Lyu Han authored
-
- 22 Nov, 2023 1 commit
-
-
Chen Xin authored
* turbomind support export model params * fix overflow * support turbomind.from_pretrained * fix tp * support AutoModel * support load kv qparams * update auto_awq * udpate docstring * export lmdeploy version * update doc * remove download_hf_repo * LmdeployForCausalLM -> LmdeployForCausalLM * refactor turbomind.py * update comment * add bfloat16 convert back * support gradio run_locl load hf * support resuful api server load hf * add docs * support loading previous quantized model * adapt pr 690 * udpate docs * not export turbomind config when quantize a model * check model_name when can not get it from config.json * update readme * remove model_name in auto_awq * update * update * udpate * fix build * absolute import
-
- 20 Nov, 2023 1 commit
-
-
Lyu Han authored
* update * update config guide * update guide * upate user guide according to review comments
-
- 19 Nov, 2023 1 commit
-
-
AllentDan authored
* update restful_api.md * add a hint * repeat 3 time
-
- 13 Nov, 2023 1 commit
-
-
pppppM authored
-
- 10 Nov, 2023 1 commit
-
-
RunningLeon authored
* update reqs * update docs * resolve comments * upgrade pydantic * fix rebase * update doc * update * update * update readme * update * add flash-attn
-
- 01 Nov, 2023 1 commit
-
-
AllentDan authored
* make IPv6 compatible, safe run for coroutine interrupting * instance_id -> session_id and fix api_client.py * update doc * remove useless faq * safe ip mapping * update app.py * WIP completion * completion * update doc * disable interactive mode for /v1/chat/completions * docstring * docstring * refactor gradio * update gradio * udpate * update doc * rename * session_id default -1 * missed two files * add a APIClient * add chat func for APIClient * refine * add concurrent function * sequence_start, sequence_end --> interactive_mode * update doc * comments * doc * better text completion * remove /v1/embeddings * comments * deprecate generate and use /v1/interactive/completions * /v1/interactive/completion -> /v1/chat/interactive * embeddings * rename * remove wrong arg description * docstring * fix * update cli * update doc * strict session_len limit condition * pass model args to api_server
-
- 25 Oct, 2023 2 commits
-
-
RunningLeon authored
* add * import fire in main * wrap to speed up fire cli * update * update docs * update docs * fix * resolve commennts * resolve confict and add test for cli
-
Lyu Han authored
* add build from docker section * update * install python package * update * update * update
-
- 23 Oct, 2023 1 commit
-
- 13 Oct, 2023 1 commit
-
-
del-zhenwu authored
* [doc] Update benchmark command in w4a16.md * Update w4a16.md * Update w4a16.md add pip install nvidia-ml-py * [doc] Update w4a16.md * fix lint error Signed-off-by:
del-zhenwu <dele.zhenwu@gmail.com> * [doc] update model_path & prompt_tokens Signed-off-by:
del-zhenwu <dele.zhenwu@gmail.com> --------- Signed-off-by:
del-zhenwu <dele.zhenwu@gmail.com>
-
- 12 Oct, 2023 1 commit
-
-
AllentDan authored
-
- 11 Oct, 2023 2 commits
-
-
Shahrukh Khan authored
-
AllentDan authored
* make IPv6 compatible, safe run for coroutine interrupting * instance_id -> session_id and fix api_client.py * update doc * remove useless faq * safe ip mapping * update app.py * remove print * update doc
-
- 14 Sep, 2023 1 commit
-
-
nlp-pang authored
* fix the build step * Fix the build step
-
- 11 Sep, 2023 1 commit
-
-
Lyu Han authored
* tmp * add demo for codellama inference * update * update * update * update codellama.md * export rope_theta * update * update doc * fix client.py * define SamplingParam * rollback 'end' * rotary_emb_base to rotary_embedding_base * change to baichuan2-7b
-
- 05 Sep, 2023 1 commit
-
-
pppppM authored
* use conda install nccl, openmpi and rapidjson * update en doc
-
- 01 Sep, 2023 1 commit
-
-
AllentDan authored
* add incremental decoding for turbomind * update TIS * fix triton post processing * update doc * fix typo * SentencePieceTokenizer incremental decode, add qwen message prompt * docstring * update bot
-
- 30 Aug, 2023 1 commit
-
-
AllentDan authored
* update FAQ for restful api * refine
-
- 29 Aug, 2023 1 commit
-
-
tpoisonooo authored
* fix(kvint8): update doc * style(lmdeploy): format * style(kv_qparams.py): linting * fix lint * Update kv_int8.md * Update kv_int8.md --------- Co-authored-by:AllentDan <AllentDan@yeah.net>
-
- 24 Aug, 2023 2 commits
-
-
AllentDan authored
* app use async engine * add stop logic * app update cancel * app support restful-api * update doc and use the right model name * set doc url root * add comments * add an example * renew_session * update readme.md * resolve comments * Update restful_api.md * Update restful_api.md * Update restful_api.md --------- Co-authored-by:tpoisonooo <khj.application@aliyun.com>
-
pppppM authored
* fix llama2 70b * fix qwen quantization * remove pdb * add faq
-
- 22 Aug, 2023 1 commit
-
-
AllentDan authored
* add restful api * refine * add simple doc * lint * add uvicorn requirement * more args * add llama2 * docstring * update doc * save * refine * lint * better decode * add v1/embedding * add GenerateRequest * add llama2 chat template * correct profiling * update documents * add length judge * add faq * update doc and rename req_que to req_queue * fix md link, use get_logger, fix sequence_end bug * use another doc link for go to avoid lint error * add api_client.py * update doc * update doc * update function interface * update FAQ * resolve comments
-
- 21 Aug, 2023 3 commits
-
-
tpoisonooo authored
-
RunningLeon authored
* add readthedocs configs * update readme * fix link * update * remove turbomind in api * update * fix comment and remove api
-
Lyu Han authored
* Check-in FAQ * update * update
-
- 17 Aug, 2023 1 commit
-
-
tpoisonooo authored
* Update quantization.md * docs(quantization): update description * docs(README): rename quantization files
-
- 15 Aug, 2023 1 commit
-
-
Lyu Han authored
-
- 14 Aug, 2023 3 commits
-
-
Lyu Han authored
* tmp * update * update * update * update * update * remove * update * update
-
tpoisonooo authored
* feat(quantization): kv cache use asymmetric
-
Li Zhang authored
* add w4a16 * fix `deploy.py` * add doc * add w4a16 kernels * fuse w1/w3 & bugfixes * fix typo * python * guard sm75/80 features * add missing header * refactor * qkvo bias * update cost model * fix lint * update `deploy.py`
-
- 07 Aug, 2023 1 commit
-
-
WRH authored
* add some dist utils * add model utils * add termio and basicstreamer * typo * fix world size * refactor chat and tested llama1 * add internlm adapter and support stoping criteria * concat with id for internlm * update docstring * update and support llama2 * typo * move docs to docs * update docstring of session manager * update docstring * update docs * fix accel none in model * fix and add test for tensor broadcast * fix session using typing to check type * add docstrings and comprehensive condition test * unit test for dist * fix session * split unittests of utils * typo * update control flow of accel * move test model * remove main in unittest * remove some log * remove some comments
-
- 04 Aug, 2023 1 commit
-
-
AllentDan authored
* use local model for webui * local model for app.py * lint * remove print * add seed * comments * fixed seesion_id * support turbomind batch inference * update app.py * lint and docstring * move webui to serve/gradio * update doc * update doc * update docstring and rmeove print conversition * log * Update docs/zh_cn/build.md Co-authored-by:
Chen Xin <xinchen.tju@gmail.com> * Update docs/en/build.md Co-authored-by:
Chen Xin <xinchen.tju@gmail.com> * use latest gradio * fix * replace partial with InterFace * use host ip instead of coolie --------- Co-authored-by:
Chen Xin <xinchen.tju@gmail.com>
-