- 26 Sep, 2023 1 commit
-
-
AllentDan authored
* expose stop words * support string * fix * remove eoa from chatbot * remove eoa of turbomind * fix ut * suffix wheel and fix InternLM no system bug
-
- 11 Sep, 2023 1 commit
-
-
Lyu Han authored
* tmp * add demo for codellama inference * update * update * update * update codellama.md * export rope_theta * update * update doc * fix client.py * define SamplingParam * rollback 'end' * rotary_emb_base to rotary_embedding_base * change to baichuan2-7b
-
- 04 Sep, 2023 1 commit
-
-
Lyu Han authored
* read data after start processes * fix hang * fix exceptions when request_output_len is 0
-
- 01 Sep, 2023 1 commit
-
-
AllentDan authored
* add incremental decoding for turbomind * update TIS * fix triton post processing * update doc * fix typo * SentencePieceTokenizer incremental decode, add qwen message prompt * docstring * update bot
-
- 21 Aug, 2023 1 commit
-
-
AllentDan authored
* pass args like meta_prompt to model * update chatbot * update * rollback * update llama2 and qwen * refine
-
- 14 Aug, 2023 1 commit
-
-
Lyu Han authored
* rollback * rollback chatbot.py
-
- 07 Aug, 2023 2 commits
- 03 Aug, 2023 1 commit
-
-
lvhan028 authored
-
- 31 Jul, 2023 1 commit
-
-
q.yao authored
* works on interlm and vicuna * support GQA * remove comment * update readme, add logger, default tp=1 * remove log
-
- 27 Jul, 2023 1 commit
-
-
MaxMatthew authored
-
- 23 Jul, 2023 1 commit
-
-
lvhan028 authored
* refactor model.py and support baichuan-7b * remove model_name * remove hard session_len * export tokenizer.py to target dir * remove model_name from client * remove model_name * update * correct throughput equation * fix session.response * update serving.md * update readme * update according to review comments * update * update * update * update
-
- 21 Jul, 2023 1 commit
-
-
MaxMatthew authored
* Fix lmdeploy.serve.turbomind bug * add __init__.py for turbomind * add resume function * fix the assignment for session.response * Fix code style
-
- 19 Jul, 2023 1 commit
-
-
lvhan028 authored
-
- 14 Jul, 2023 1 commit
-
-
lvhan028 authored
* add puyu model for internal use * get/set session * update * add docstring
-
- 12 Jul, 2023 1 commit
-
-
lvhan028 authored
* add docstring * update * update * fix according to review results
-
- 05 Jul, 2023 1 commit
-
-
lvhan028 authored
* update internlm model * update * update * update * update * update temperature, topk and top_p * update * update * loosen log level
-
- 03 Jul, 2023 1 commit
-
-
lvhan028 authored
-
- 30 Jun, 2023 2 commits
- 29 Jun, 2023 2 commits
- 25 Jun, 2023 1 commit
-
-
lvhan028 authored
* remove constraints on model name * remove duplicate model converter * add profile * get eos and bos from server * update stop_words * update sequence_length when the last generated token is eos_id * fix * fix * check-in models * valicate model_name * make stop_words as property * debug profiling * better stats * fix assistant reponse * update profile serving * update * update
-
- 20 Jun, 2023 1 commit
-
-
lvhan028 authored
* add scripts for deploying llama family models via fastertransformer * fix * fix * set symlinks True when copying triton models templates * pack model repository for triton inference server * add exception * fix * update config.pbtxt and launching scripts
-
- 18 Jun, 2023 1 commit
-
-
lvhan028 authored
-