Commits · c6e8b9442b1fcf7bbbe4be58fcd85047f69e4112 · OpenDAS / text-generation-inference

31 Jan, 2023 6 commits
- fix(server): fix quantization for sharded models (#45) · c6e8b944
  OlivierDehaene authored Jan 31, 2023
  
  c6e8b944
- feat: Add token streaming using ServerSideEvents support (#41) · 017a2a8c
  OlivierDehaene authored Jan 31, 2023
  
  017a2a8c
- fix(server): fix seeding with multiple shards (#44) · 54fec931
  OlivierDehaene authored Jan 31, 2023
  
  54fec931
- fix(server): fix seeding on gpu (#42) · 03bdf182
  OlivierDehaene authored Jan 31, 2023
  
  03bdf182
- Revert "feat: Add token streaming using ServerSideEvents support" (#40) · 4f9ac67c
  OlivierDehaene authored Jan 31, 2023
```
Reverts huggingface/text-generation-inference#36
```
  4f9ac67c
- feat: Add token streaming using ServerSideEvents support (#36) · 7fbfbb0d
  OlivierDehaene authored Jan 31, 2023
```
Add token streaming using ServerSideEvents (SSE).

The signature of the SSE events is: 

```rust
  struct Details {
      finish_reason: String,
      generated_tokens: u32,
      seed: Option<u64>,
  }
  
  struct StreamResponse {
      token: Token,
      generated_text: Option<String>,
      details: Option<Details>,
  }
  
  struct ErrorResponse {
      error: String,
  }
```
```
  7fbfbb0d
30 Jan, 2023 1 commit
- feat: Support sampling seeding (#37) · cd298bc5
  OlivierDehaene authored Jan 30, 2023
```
Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
```
  cd298bc5
26 Jan, 2023 2 commits
- feat(router): Remove second lock from batcher hot path (#27) · 1539d3cb
  OlivierDehaene authored Jan 26, 2023
```
@njhill
```
  1539d3cb
- feat(bloom): use torch.nn.Linear and torch.nn.GELU (#33) · ce960be0
  OlivierDehaene authored Jan 26, 2023
  
  ce960be0
24 Jan, 2023 1 commit
- fix(dockerfile): fix docker build (#32) · 13e7044a
  OlivierDehaene authored Jan 24, 2023
  
  13e7044a
23 Jan, 2023 3 commits
- fix(router): fix api-inference deployment (#31) · 5c01e254
  OlivierDehaene authored Jan 23, 2023
  
  5c01e254
- fix(docker): fix api-inference deployment (#30) · ab2ad91d
  OlivierDehaene authored Jan 23, 2023
  
  ab2ad91d
- feat(docker): Make the image compatible with api-inference (#29) · f9d0ec37
  OlivierDehaene authored Jan 23, 2023
  
  f9d0ec37
20 Jan, 2023 2 commits
- fix(server): Fix position ids (#28) · 1f570d18
  OlivierDehaene authored Jan 20, 2023
  
  1f570d18
- feat(server): Support SantaCoder (#26) · 15511edc
  OlivierDehaene authored Jan 20, 2023
  
  15511edc
17 Jan, 2023 2 commits
- fix(router): Obey max batch size (#23) · f7ac3949
  Nick Hill authored Jan 17, 2023
  
  f7ac3949
- fix(server): Minor refactorization using new_zeros (#24) · e6d3eb5d
  Nick Hill authored Jan 17, 2023
```
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
```
  e6d3eb5d
05 Jan, 2023 1 commit
- feat(launcher): Log server stdout (#19) · fcc2c5fc
  OlivierDehaene authored Jan 05, 2023
```
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
```
  fcc2c5fc
03 Jan, 2023 2 commits
- fix(server): Use cleanup_tokenization_spaces=False for lossless decoding (#13) · b94f3021
  Nicolas Patry authored Jan 03, 2023
```
Fixes #12 in the easiest way I could think of.
```
  b94f3021
- feat(router): Add const parameters to validation logic (#15) · 60472f9d
  Nick Hill authored Jan 03, 2023
```
I noticed some opportunity to collapse some of the logic, in case you
are interested.
```
  60472f9d
30 Dec, 2022 2 commits

fix(router): Include special tokens when tokenizing (#14) · 3efa5bbb

Nick Hill authored Dec 30, 2022

There's currently a discrepancy in the tokenization between the router
and python server code. The latter includes special tokens but former
does not.

This results in a token count mismatch for seq2seq models such as mt0
where the tokenizer emits an EOS token at the end.

This in turn results in some unexpected/incorrect output, in particular
when batch concatenation is involved, because the python code uses the
input length passed from the router for each row.

As far as I can tell, it is better to include this token in the encoder
`input_ids`, so I guess it's best to just adjust on the router side.

3efa5bbb

fix(server): Check for device type correctly when determining initial padding (#16) · 686cc667
Nick Hill authored Dec 30, 2022
```
AFAIK there is no torch device type called "gpu".
```
686cc667

16 Dec, 2022 2 commits
- fix(server): Fix stop sequences (#11) · 611e21cb
  OlivierDehaene authored Dec 16, 2022
  
  611e21cb
- feat(launcher): Add integration tests (#9) · 3e2e6240
  OlivierDehaene authored Dec 16, 2022
  
  3e2e6240
15 Dec, 2022 1 commit
- feat: Return logprobs (#8) · 32a25306
  OlivierDehaene authored Dec 15, 2022
  
  32a25306
12 Dec, 2022 1 commit
- feat: Support stop sequences (#7) · 718096f6
  OlivierDehaene authored Dec 12, 2022
  
  718096f6
08 Dec, 2022 2 commits
- fix(server): Only pad to multiple of 8 on GPUs · 042180d8
  OlivierDehaene authored Dec 08, 2022
  
  042180d8
- feat(server): Add model tests (#6) · a2985036
  OlivierDehaene authored Dec 08, 2022
  
  a2985036
05 Dec, 2022 1 commit

fix(batching): Avoid theoretical hang in batcher loop (#5) · 31d76e23

Nick Hill authored Dec 05, 2022



- Avoid theoretical hang in batcher loop
- Avoid a couple of clones in the router generate method
- Keep attention mask tensors as integers
- Remove num_heads attribute
Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>

31d76e23

01 Dec, 2022 1 commit
- feat(server): Support Galactica (#4) · daa1d81d
  OlivierDehaene authored Dec 01, 2022
  
  daa1d81d
14 Nov, 2022 4 commits
- fix(router): Handle tokenizer errors · d6d5b12e
  OlivierDehaene authored Nov 14, 2022
  
  d6d5b12e
- fix(readme): Typo · feb7806c
  OlivierDehaene authored Nov 14, 2022
  
  feb7806c
- fix(router): Fix HTTP status codes · 91f5f862
  OlivierDehaene authored Nov 14, 2022
  
  91f5f862
- feat(rust): Update to 1.65 · 6c781025
  OlivierDehaene authored Nov 14, 2022
  
  6c781025
09 Nov, 2022 1 commit
- feat(server): Clarify CausalLMBatch concatenate method · dccd5c2b
  OlivierDehaene authored Nov 09, 2022
  
  dccd5c2b
08 Nov, 2022 1 commit
- fix(server): Fix Transformers fork version · fa43fb71
  OlivierDehaene authored Nov 08, 2022
  
  fa43fb71
07 Nov, 2022 1 commit
- feat(server): Improved doc · 4236e41b
  OlivierDehaene authored Nov 07, 2022
  
  4236e41b
04 Nov, 2022 3 commits
- feat(launcher): Pass CUDA_VISIBLE_DEVICES to the shard · cea6051e
  OlivierDehaene authored Nov 04, 2022
  
  cea6051e
- feat(server): Support AutoModelForSeq2SeqLM · 427d7cc4
  OlivierDehaene authored Nov 04, 2022
  
  427d7cc4
- feat(server): Support generic AutoModelForCausalLM · c5665f5c
  OlivierDehaene authored Nov 04, 2022
  
  c5665f5c