Commits · 5d87c20fad2b816e885aeb3d4e3e2f2b368bf909 · OpenDAS / Lmdeploy

26 Sep, 2023 1 commit
- Fix memory leak (#488) · 5d87c20f
  Lyu Han authored Sep 26, 2023
```
* Fix memory leak

* modern c++
```
  5d87c20f
18 Sep, 2023 1 commit

[Fix] Support actual seqlen in flash-attention2 (#418) · abe9f7bd

q.yao authored Sep 18, 2023

* support actual seqlen

* fix lint

* update variable types

* lint

* update type

* fix lint

---------

abe9f7bd

24 Aug, 2023 1 commit

Pad tok_embedding and output weights to make their shape divisible by TP (#285) · 4903d3cc

Lyu Han authored Aug 24, 2023

* Pad tok_embedding and output weights to make their shape divisible by TP

* update

* update

* update

* update

* update llamaBatch

4903d3cc

14 Aug, 2023 1 commit

[Feature] Blazing fast W4A16 inference (#202) · c3290cad

Li Zhang authored Aug 14, 2023

* add w4a16

* fix `deploy.py`

* add doc

* add w4a16 kernels

* fuse w1/w3 & bugfixes

* fix typo

* python

* guard sm75/80 features

* add missing header

* refactor

* qkvo bias

* update cost model

* fix lint

* update `deploy.py`

c3290cad

21 Jul, 2023 1 commit

[Feature] Support Llama-2 with GQA (#147) · f07b697b

Li Zhang authored Jul 21, 2023

* add GQA for llama2

* fix model conversion

* fix lint & remove dev log

* update news

* minor

* fix allocation size

* fix split_dim for w_qkv.bias

f07b697b

01 Jul, 2023 3 commits
- build turbomind (#35) · 35d64462
  lvhan028 authored Jul 01, 2023
```
* build turbomind

* change namespace fastertransformer to turbomind

* change logger name
```
  35d64462
- rename src/fastertransformer to src/turbomind (#33) · 53d2e42c
  lvhan028 authored Jul 01, 2023
  
  53d2e42c
- Add lint action (#32) · fe46dac2
  AllentDan authored Jul 01, 2023
```
* temp

* fix lint

* csrc->src

* remove clang-format

* skip .rst

* skip doc

* clang-format

version

version

* mat_B
```
  fe46dac2
24 Jun, 2023 1 commit
- Support attention bias (#14) · 2700abb3
  Li Zhang authored Jun 24, 2023
```
* support attention bias

* fix conflict
```
  2700abb3
20 Jun, 2023 1 commit
- check-in fastertransformer (#7) · 9efcac38
  Li Zhang authored Jun 20, 2023
```
* add ft code

* gitignore

* fix lint

* revert fmha
```
  9efcac38