Commits · ebc74d5666bf2a304181a5fe8cc7dfc6ddbe1d95 · OpenDAS / text-generation-inference

24 Apr, 2023 2 commits
- feat(router): use number of tokens in batch as input for dynamic batching (#226) · ebc74d56
  OlivierDehaene authored Apr 24, 2023
```
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
```
  ebc74d56
- feat(server): reduce memory requirement (#214) · 4a7dd408
  Nick Hill authored Apr 24, 2023
  
  4a7dd408
21 Apr, 2023 1 commit
- feat(router): add device and dtype info (#215) · 343437c7
  OlivierDehaene authored Apr 21, 2023
  
  343437c7
20 Apr, 2023 1 commit
- feat(router): drop requests when client closes the channel (#202) · 709d8936
  OlivierDehaene authored Apr 20, 2023
  
  709d8936
12 Apr, 2023 1 commit
- feat(server): optimize decode for sane tokenizers (#170) · 5fa8ae04
  OlivierDehaene authored Apr 12, 2023
  
  5fa8ae04
11 Apr, 2023 1 commit
- feat(server): add flash attention llama (#144) · 299217c9
  OlivierDehaene authored Apr 11, 2023
  
  299217c9
09 Apr, 2023 1 commit
- feat(router): make router input validation optional (#164) · 99879600
  OlivierDehaene authored Apr 09, 2023
  
  99879600
16 Mar, 2023 1 commit
- fix(server): use server tokenizer as gt (#128) · b49dbf2d
  OlivierDehaene authored Mar 16, 2023
  
  b49dbf2d
15 Mar, 2023 1 commit
- fix(server): add position ids to neox (#126) · 8ad60b75
  OlivierDehaene authored Mar 15, 2023
  
  8ad60b75
08 Mar, 2023 1 commit
- fix(server): fix index out of range for watermarking (#110) · 941cd42e
  OlivierDehaene authored Mar 08, 2023
  
  941cd42e
07 Mar, 2023 1 commit
- feat(clients): Python client (#103) · 3fef90d5
  OlivierDehaene authored Mar 07, 2023
  
  3fef90d5
06 Mar, 2023 1 commit
- fix(server): fix generate_stream by forcing tokens to be decoded correctly (#100) · 9b205d33
  OlivierDehaene authored Mar 06, 2023
  
  9b205d33
02 Mar, 2023 1 commit
- feat(server): add logits watermark (#90) · 9b8ea6a6
  OlivierDehaene authored Mar 02, 2023
  
  9b8ea6a6
24 Feb, 2023 3 commits
- fix(server): fix token_is_special (#87) · 65e2f162
  OlivierDehaene authored Feb 24, 2023
  
  65e2f162
- feat(server): add special token bool (#85) · 0ac184ce
  OlivierDehaene authored Feb 24, 2023
  
  0ac184ce
- feat(server): pre-allocate max attention mask (#75) · 44ce098c
  OlivierDehaene authored Feb 24, 2023
  
  44ce098c
13 Feb, 2023 1 commit
- feat: add distributed tracing (#62) · 9af45414
  OlivierDehaene authored Feb 13, 2023
  
  9af45414
07 Feb, 2023 1 commit
- fix(server): better handling of inference mode (#57) · 4acc42a6
  OlivierDehaene authored Feb 07, 2023
  
  4acc42a6
03 Feb, 2023 1 commit
- feat(router): refactor API and add openAPI schemas (#53) · 20c3c594
  OlivierDehaene authored Feb 03, 2023
  
  20c3c594
01 Feb, 2023 1 commit
- feat(server): support repetition penalty (#47) · 313194f6
  OlivierDehaene authored Feb 01, 2023
  
  313194f6
31 Jan, 2023 5 commits
- feat(server): Support GPT-Neox (#39) · f830706b
  OlivierDehaene authored Jan 31, 2023
  
  f830706b
- feat: Add token streaming using ServerSideEvents support (#41) · 017a2a8c
  OlivierDehaene authored Jan 31, 2023
  
  017a2a8c
- fix(server): fix seeding on gpu (#42) · 03bdf182
  OlivierDehaene authored Jan 31, 2023
  
  03bdf182
- Revert "feat: Add token streaming using ServerSideEvents support" (#40) · 4f9ac67c
  OlivierDehaene authored Jan 31, 2023
```
Reverts huggingface/text-generation-inference#36
```
  4f9ac67c
- feat: Add token streaming using ServerSideEvents support (#36) · 7fbfbb0d
  OlivierDehaene authored Jan 31, 2023
```
Add token streaming using ServerSideEvents (SSE).

The signature of the SSE events is: 

```rust
  struct Details {
      finish_reason: String,
      generated_tokens: u32,
      seed: Option<u64>,
  }
  
  struct StreamResponse {
      token: Token,
      generated_text: Option<String>,
      details: Option<Details>,
  }
  
  struct ErrorResponse {
      error: String,
  }
```
```
  7fbfbb0d
30 Jan, 2023 1 commit
- feat: Support sampling seeding (#37) · cd298bc5
  OlivierDehaene authored Jan 30, 2023
```
Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
```
  cd298bc5
20 Jan, 2023 2 commits
- fix(server): Fix position ids (#28) · 1f570d18
  OlivierDehaene authored Jan 20, 2023
  
  1f570d18
- feat(server): Support SantaCoder (#26) · 15511edc
  OlivierDehaene authored Jan 20, 2023
  
  15511edc
17 Jan, 2023 1 commit

fix(server): Minor refactorization using new_zeros (#24) · e6d3eb5d

Nick Hill authored Jan 17, 2023

- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher

e6d3eb5d

30 Dec, 2022 1 commit
- fix(server): Check for device type correctly when determining initial padding (#16) · 686cc667
  Nick Hill authored Dec 30, 2022
```
AFAIK there is no torch device type called "gpu".
```
  686cc667
16 Dec, 2022 1 commit
- fix(server): Fix stop sequences (#11) · 611e21cb
  OlivierDehaene authored Dec 16, 2022
  
  611e21cb
15 Dec, 2022 1 commit
- feat: Return logprobs (#8) · 32a25306
  OlivierDehaene authored Dec 15, 2022
  
  32a25306
12 Dec, 2022 1 commit
- feat: Support stop sequences (#7) · 718096f6
  OlivierDehaene authored Dec 12, 2022
  
  718096f6
08 Dec, 2022 2 commits
- fix(server): Only pad to multiple of 8 on GPUs · 042180d8
  OlivierDehaene authored Dec 08, 2022
  
  042180d8
- feat(server): Add model tests (#6) · a2985036
  OlivierDehaene authored Dec 08, 2022
  
  a2985036
05 Dec, 2022 1 commit

fix(batching): Avoid theoretical hang in batcher loop (#5) · 31d76e23

Nick Hill authored Dec 05, 2022



- Avoid theoretical hang in batcher loop
- Avoid a couple of clones in the router generate method
- Keep attention mask tensors as integers
- Remove num_heads attribute
Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>

31d76e23

07 Nov, 2022 1 commit
- feat(server): Improved doc · 4236e41b
  OlivierDehaene authored Nov 07, 2022
  
  4236e41b
04 Nov, 2022 1 commit
- feat(server): Support AutoModelForSeq2SeqLM · 427d7cc4
  OlivierDehaene authored Nov 04, 2022
  
  427d7cc4