Commits · 3efa5bbbfd5868695da4d5d9ad23d81f48f1e5a8 · OpenDAS / text-generation-inference

30 Dec, 2022 2 commits

fix(router): Include special tokens when tokenizing (#14) · 3efa5bbb

Nick Hill authored Dec 30, 2022

There's currently a discrepancy in the tokenization between the router
and python server code. The latter includes special tokens but former
does not.

This results in a token count mismatch for seq2seq models such as mt0
where the tokenizer emits an EOS token at the end.

This in turn results in some unexpected/incorrect output, in particular
when batch concatenation is involved, because the python code uses the
input length passed from the router for each row.

As far as I can tell, it is better to include this token in the encoder
`input_ids`, so I guess it's best to just adjust on the router side.

3efa5bbb

fix(server): Check for device type correctly when determining initial padding (#16) · 686cc667
Nick Hill authored Dec 30, 2022
```
AFAIK there is no torch device type called "gpu".
```
686cc667

16 Dec, 2022 2 commits
- fix(server): Fix stop sequences (#11) · 611e21cb
  OlivierDehaene authored Dec 16, 2022
  
  611e21cb
- feat(launcher): Add integration tests (#9) · 3e2e6240
  OlivierDehaene authored Dec 16, 2022
  
  3e2e6240
15 Dec, 2022 1 commit
- feat: Return logprobs (#8) · 32a25306
  OlivierDehaene authored Dec 15, 2022
  
  32a25306
12 Dec, 2022 1 commit
- feat: Support stop sequences (#7) · 718096f6
  OlivierDehaene authored Dec 12, 2022
  
  718096f6
08 Dec, 2022 2 commits
- fix(server): Only pad to multiple of 8 on GPUs · 042180d8
  OlivierDehaene authored Dec 08, 2022
  
  042180d8
- feat(server): Add model tests (#6) · a2985036
  OlivierDehaene authored Dec 08, 2022
  
  a2985036
05 Dec, 2022 1 commit

fix(batching): Avoid theoretical hang in batcher loop (#5) · 31d76e23

Nick Hill authored Dec 05, 2022



- Avoid theoretical hang in batcher loop
- Avoid a couple of clones in the router generate method
- Keep attention mask tensors as integers
- Remove num_heads attribute
Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>

31d76e23

01 Dec, 2022 1 commit
- feat(server): Support Galactica (#4) · daa1d81d
  OlivierDehaene authored Dec 01, 2022
  
  daa1d81d
14 Nov, 2022 4 commits
- fix(router): Handle tokenizer errors · d6d5b12e
  OlivierDehaene authored Nov 14, 2022
  
  d6d5b12e
- fix(readme): Typo · feb7806c
  OlivierDehaene authored Nov 14, 2022
  
  feb7806c
- fix(router): Fix HTTP status codes · 91f5f862
  OlivierDehaene authored Nov 14, 2022
  
  91f5f862
- feat(rust): Update to 1.65 · 6c781025
  OlivierDehaene authored Nov 14, 2022
  
  6c781025
09 Nov, 2022 1 commit
- feat(server): Clarify CausalLMBatch concatenate method · dccd5c2b
  OlivierDehaene authored Nov 09, 2022
  
  dccd5c2b
08 Nov, 2022 1 commit
- fix(server): Fix Transformers fork version · fa43fb71
  OlivierDehaene authored Nov 08, 2022
  
  fa43fb71
07 Nov, 2022 1 commit
- feat(server): Improved doc · 4236e41b
  OlivierDehaene authored Nov 07, 2022
  
  4236e41b
04 Nov, 2022 3 commits
- feat(launcher): Pass CUDA_VISIBLE_DEVICES to the shard · cea6051e
  OlivierDehaene authored Nov 04, 2022
  
  cea6051e
- feat(server): Support AutoModelForSeq2SeqLM · 427d7cc4
  OlivierDehaene authored Nov 04, 2022
  
  427d7cc4
- feat(server): Support generic AutoModelForCausalLM · c5665f5c
  OlivierDehaene authored Nov 04, 2022
  
  c5665f5c
03 Nov, 2022 1 commit
- fix(models): Revert buggy support for AutoModel · 755fc0e4
  OlivierDehaene authored Nov 03, 2022
  
  755fc0e4
02 Nov, 2022 1 commit
- feat: Use json formatter by default in docker image · b3b7ea0d
  OlivierDehaene authored Nov 02, 2022
  
  b3b7ea0d
28 Oct, 2022 1 commit
- feat(server): Support all AutoModelForCausalLM on a best effort basis · 3cf6368c
  OlivierDehaene authored Oct 28, 2022
  
  3cf6368c
27 Oct, 2022 1 commit
- feat(server): Support bitsandbytes · 09674e6d
  OlivierDehaene authored Oct 27, 2022
  
  09674e6d
22 Oct, 2022 3 commits
- feat(client): Simplify sharded logic · beb55212
  OlivierDehaene authored Oct 22, 2022
  
  beb55212
- feat(server): Use safetensors · c8ce9b25
  Nicolas Patry authored Oct 22, 2022
```
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
```
  c8ce9b25
- Create LICENSE (#2 ) · be8827fe
  Thomas Wang authored Oct 22, 2022
  
  be8827fe
21 Oct, 2022 2 commits
- feat(router): Add max_waiting_tokens · c8378933
  OlivierDehaene authored Oct 21, 2022
  
  c8378933
- fix(validation): Fix error messages · 895a341d
  OlivierDehaene authored Oct 21, 2022
  
  895a341d
20 Oct, 2022 1 commit
- v0.1.0 · f16f2f5a
  Olivier Dehaene authored Oct 18, 2022
  
  f16f2f5a
17 Oct, 2022 3 commits
- feat: Add arguments to CLI · 92c1ecd0
  Olivier Dehaene authored Oct 17, 2022
  
  92c1ecd0
- feat: Improve error handling · 5e5d8766
  Olivier Dehaene authored Oct 17, 2022
  
  5e5d8766
- Update aml deployment · 00e6ce44
  Olivier Dehaene authored Oct 17, 2022
  
  00e6ce44
15 Oct, 2022 1 commit
- feat: Add AML deployment · bcb53903
  Olivier Dehaene authored Oct 15, 2022
  
  bcb53903
14 Oct, 2022 1 commit
- feat: Docker image · bf99afe9
  Olivier Dehaene authored Oct 14, 2022
  
  bf99afe9
11 Oct, 2022 4 commits
- Use axum · 39df4d99
  Olivier Dehaene authored Oct 11, 2022
  
  39df4d99
- ValidationError was not correctly handled · e86ecbac
  Olivier Dehaene authored Oct 11, 2022
  
  e86ecbac
- Refactored gRPC interface · 4c693e65
  Olivier Dehaene authored Oct 11, 2022
```
Added validation logic
```
  4c693e65
- Add load testing · fa9a0884
  Olivier Dehaene authored Oct 11, 2022
  
  fa9a0884
08 Oct, 2022 1 commit
- fix: cleanup · 1d986983
  Olivier Dehaene authored Oct 08, 2022
  
  1d986983