Commits · e520d5b34917f02504a280f87420fd2a5ce0eeb2 · OpenDAS / text-generation-inference

08 Feb, 2023 2 commits
- fixed SSE naming (#61) · e520d5b3
  Yannic Kilcher authored Feb 08, 2023
```
https://en.wikipedia.org/wiki/Server-sent_events
```
  e520d5b3
- fix(docker): increase shm size (#60) · 1ad3250b
  OlivierDehaene authored Feb 08, 2023
  
  1ad3250b
07 Feb, 2023 3 commits
- feat(server): support t5 (#59) · c503a639
  OlivierDehaene authored Feb 07, 2023
  
  c503a639
- V0.2.1 (#58) · 2fe5e1b3
  OlivierDehaene authored Feb 07, 2023
  
  2fe5e1b3
- fix(server): better handling of inference mode (#57) · 4acc42a6
  OlivierDehaene authored Feb 07, 2023
  
  4acc42a6
06 Feb, 2023 1 commit
- feat(ci): push to AML registry (#56) · e114d874
  OlivierDehaene authored Feb 06, 2023
  
  e114d874
03 Feb, 2023 2 commits
- feat(docs): Clarify installation steps (#54) · a0dca443
  lewtun authored Feb 03, 2023
```
Adds some bits for first-time users (like me 😄 )
```
  a0dca443
- feat(router): refactor API and add openAPI schemas (#53) · 20c3c594
  OlivierDehaene authored Feb 03, 2023
  
  20c3c594
02 Feb, 2023 3 commits
- breaking(router): modify /generate API to only return generated text (#50) · b1482d90
  OlivierDehaene authored Feb 02, 2023
```
@njhill, @yk FYI

generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.

We also remove the unused Vec.
```
  b1482d90
- feat(router): use background task to manage request queue (#52) · 7b870e1e
  OlivierDehaene authored Feb 02, 2023
```
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
```
  7b870e1e
- fix(server): allow greedy repetition penalty (#51) · df227ac2
  OlivierDehaene authored Feb 02, 2023
  
  df227ac2
01 Feb, 2023 3 commits
- feat(server): allow the server to use a local weight cache (#49) · 775115e3
  OlivierDehaene authored Feb 01, 2023
  
  775115e3
- feat(server): support repetition penalty (#47) · 313194f6
  OlivierDehaene authored Feb 01, 2023
  
  313194f6
- feat(server): allow gpt-neox models with odd vocab sizes to be sharded (#48) · 2ad895a6
  OlivierDehaene authored Feb 01, 2023
  
  2ad895a6
31 Jan, 2023 8 commits
- feat(ci): Docker build and push (#46) · 404ed7a1
  OlivierDehaene authored Jan 31, 2023
  
  404ed7a1
- feat(server): Support GPT-Neox (#39) · f830706b
  OlivierDehaene authored Jan 31, 2023
  
  f830706b
- fix(server): fix quantization for sharded models (#45) · c6e8b944
  OlivierDehaene authored Jan 31, 2023
  
  c6e8b944
- feat: Add token streaming using ServerSideEvents support (#41) · 017a2a8c
  OlivierDehaene authored Jan 31, 2023
  
  017a2a8c
- fix(server): fix seeding with multiple shards (#44) · 54fec931
  OlivierDehaene authored Jan 31, 2023
  
  54fec931
- fix(server): fix seeding on gpu (#42) · 03bdf182
  OlivierDehaene authored Jan 31, 2023
  
  03bdf182
- Revert "feat: Add token streaming using ServerSideEvents support" (#40) · 4f9ac67c
  OlivierDehaene authored Jan 31, 2023
```
Reverts huggingface/text-generation-inference#36
```
  4f9ac67c
- feat: Add token streaming using ServerSideEvents support (#36) · 7fbfbb0d
  OlivierDehaene authored Jan 31, 2023
```
Add token streaming using ServerSideEvents (SSE).

The signature of the SSE events is: 

```rust
  struct Details {
      finish_reason: String,
      generated_tokens: u32,
      seed: Option<u64>,
  }
  
  struct StreamResponse {
      token: Token,
      generated_text: Option<String>,
      details: Option<Details>,
  }
  
  struct ErrorResponse {
      error: String,
  }
```
```
  7fbfbb0d
30 Jan, 2023 1 commit
- feat: Support sampling seeding (#37) · cd298bc5
  OlivierDehaene authored Jan 30, 2023
```
Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
```
  cd298bc5
26 Jan, 2023 2 commits
- feat(router): Remove second lock from batcher hot path (#27) · 1539d3cb
  OlivierDehaene authored Jan 26, 2023
```
@njhill
```
  1539d3cb
- feat(bloom): use torch.nn.Linear and torch.nn.GELU (#33) · ce960be0
  OlivierDehaene authored Jan 26, 2023
  
  ce960be0
24 Jan, 2023 1 commit
- fix(dockerfile): fix docker build (#32) · 13e7044a
  OlivierDehaene authored Jan 24, 2023
  
  13e7044a
23 Jan, 2023 3 commits
- fix(router): fix api-inference deployment (#31) · 5c01e254
  OlivierDehaene authored Jan 23, 2023
  
  5c01e254
- fix(docker): fix api-inference deployment (#30) · ab2ad91d
  OlivierDehaene authored Jan 23, 2023
  
  ab2ad91d
- feat(docker): Make the image compatible with api-inference (#29) · f9d0ec37
  OlivierDehaene authored Jan 23, 2023
  
  f9d0ec37
20 Jan, 2023 2 commits
- fix(server): Fix position ids (#28) · 1f570d18
  OlivierDehaene authored Jan 20, 2023
  
  1f570d18
- feat(server): Support SantaCoder (#26) · 15511edc
  OlivierDehaene authored Jan 20, 2023
  
  15511edc
17 Jan, 2023 2 commits
- fix(router): Obey max batch size (#23) · f7ac3949
  Nick Hill authored Jan 17, 2023
  
  f7ac3949
- fix(server): Minor refactorization using new_zeros (#24) · e6d3eb5d
  Nick Hill authored Jan 17, 2023
```
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
```
  e6d3eb5d
05 Jan, 2023 1 commit
- feat(launcher): Log server stdout (#19) · fcc2c5fc
  OlivierDehaene authored Jan 05, 2023
```
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
```
  fcc2c5fc
03 Jan, 2023 2 commits
- fix(server): Use cleanup_tokenization_spaces=False for lossless decoding (#13) · b94f3021
  Nicolas Patry authored Jan 03, 2023
```
Fixes #12 in the easiest way I could think of.
```
  b94f3021
- feat(router): Add const parameters to validation logic (#15) · 60472f9d
  Nick Hill authored Jan 03, 2023
```
I noticed some opportunity to collapse some of the logic, in case you
are interested.
```
  60472f9d
30 Dec, 2022 2 commits

fix(router): Include special tokens when tokenizing (#14) · 3efa5bbb

Nick Hill authored Dec 30, 2022

There's currently a discrepancy in the tokenization between the router
and python server code. The latter includes special tokens but former
does not.

This results in a token count mismatch for seq2seq models such as mt0
where the tokenizer emits an EOS token at the end.

This in turn results in some unexpected/incorrect output, in particular
when batch concatenation is involved, because the python code uses the
input length passed from the router for each row.

As far as I can tell, it is better to include this token in the encoder
`input_ids`, so I guess it's best to just adjust on the router side.

3efa5bbb

fix(server): Check for device type correctly when determining initial padding (#16) · 686cc667
Nick Hill authored Dec 30, 2022
```
AFAIK there is no torch device type called "gpu".
```
686cc667

16 Dec, 2022 2 commits
- fix(server): Fix stop sequences (#11) · 611e21cb
  OlivierDehaene authored Dec 16, 2022
  
  611e21cb
- feat(launcher): Add integration tests (#9) · 3e2e6240
  OlivierDehaene authored Dec 16, 2022
  
  3e2e6240