- 16 Oct, 2023 1 commit
-
-
q.yao authored
* move tokenizer * remove Tokenizer in init * update deploy.py
-
- 26 Sep, 2023 1 commit
-
-
AllentDan authored
* fix benchmark serving cannot use Qwen tokenizer * update benchmark readme
-
- 04 Sep, 2023 1 commit
-
-
Lyu Han authored
* read data after start processes * fix hang * fix exceptions when request_output_len is 0
-
- 22 Aug, 2023 1 commit
-
-
AllentDan authored
* add restful api * refine * add simple doc * lint * add uvicorn requirement * more args * add llama2 * docstring * update doc * save * refine * lint * better decode * add v1/embedding * add GenerateRequest * add llama2 chat template * correct profiling * update documents * add length judge * add faq * update doc and rename req_que to req_queue * fix md link, use get_logger, fix sequence_end bug * use another doc link for go to avoid lint error * add api_client.py * update doc * update doc * update function interface * update FAQ * resolve comments
-
- 07 Aug, 2023 1 commit
-
-
lvhan028 authored
* change to incremental decoding * update
-
- 31 Jul, 2023 1 commit
-
-
del-zhenwu authored
-
- 23 Jul, 2023 1 commit
-
-
lvhan028 authored
* refactor model.py and support baichuan-7b * remove model_name * remove hard session_len * export tokenizer.py to target dir * remove model_name from client * remove model_name * update * correct throughput equation * fix session.response * update serving.md * update readme * update according to review comments * update * update * update * update
-
- 19 Jul, 2023 1 commit
-
-
rollroll90 authored
-
- 30 Jun, 2023 2 commits
- 25 Jun, 2023 1 commit
-
-
lvhan028 authored
* remove constraints on model name * remove duplicate model converter * add profile * get eos and bos from server * update stop_words * update sequence_length when the last generated token is eos_id * fix * fix * check-in models * valicate model_name * make stop_words as property * debug profiling * better stats * fix assistant reponse * update profile serving * update * update
-