- 03 Jul, 2023 3 commits
-
-
vansin authored
* [Doc] add persistent batch inference * update * update * Update README.md * Update README_zh-CN.md
-
lvhan028 authored
-
tpoisonooo authored
* feat(template): remote diff * feat(cmake): use c++17
-
- 01 Jul, 2023 5 commits
-
-
lvhan028 authored
-
lvhan028 authored
* change target tritonfastertransformerbackend to tritonturbomindbackend * install targets to backends/turbomind * changge model_dir
-
lvhan028 authored
* build turbomind * change namespace fastertransformer to turbomind * change logger name
-
lvhan028 authored
-
AllentDan authored
* temp * fix lint * csrc->src * remove clang-format * skip .rst * skip doc * clang-format version version * mat_B
-
- 30 Jun, 2023 3 commits
- 29 Jun, 2023 4 commits
- 28 Jun, 2023 2 commits
-
-
tpoisonooo authored
* feat(src): add int8 and compile passed * feat(kernels): fix * feat(llama): update kernel * feat(src): add debug * fix(kernel): k_cache use int8_t pointer * style(llama): clean code * feat(deploy.py): revert to enable fmha * style(LlamaV2): clean code * feat(deploy.py): add default quant policy
-
Li Zhang authored
-
- 26 Jun, 2023 1 commit
-
-
Li Zhang authored
-
- 25 Jun, 2023 4 commits
-
-
tpoisonooo authored
-
tpoisonooo authored
-
tpoisonooo authored
-
lvhan028 authored
* remove constraints on model name * remove duplicate model converter * add profile * get eos and bos from server * update stop_words * update sequence_length when the last generated token is eos_id * fix * fix * check-in models * valicate model_name * make stop_words as property * debug profiling * better stats * fix assistant reponse * update profile serving * update * update
-
- 24 Jun, 2023 1 commit
-
-
Li Zhang authored
* support attention bias * fix conflict
-
- 22 Jun, 2023 2 commits
-
-
lvhan028 authored
* remove constraints on model name * remove duplicate model converter
-
q.yao authored
* update arch * clang-format * remove comment --------- Co-authored-by:yaoqian <yaoqian@localhost.localdomain>
-
- 21 Jun, 2023 3 commits
- 20 Jun, 2023 4 commits
-
-
lvhan028 authored
* check-in dockerfile * check-in dockerfile
-
Li Zhang authored
* add ft code * gitignore * fix lint * revert fmha
-
lvhan028 authored
* add logo * update readme
-
lvhan028 authored
* add scripts for deploying llama family models via fastertransformer * fix * fix * set symlinks True when copying triton models templates * pack model repository for triton inference server * add exception * fix * update config.pbtxt and launching scripts
-
- 18 Jun, 2023 4 commits