Commits · 9987960062e40de2deae030ab7e4ad6f57de0b20 · OpenDAS / text-generation-inference

09 Apr, 2023 2 commits
- feat(router): make router input validation optional (#164) · 99879600
  OlivierDehaene authored Apr 09, 2023
  
  99879600
- fix(router): use buckets for metrics histograms (#163) · 7dec65a2
  OlivierDehaene authored Apr 09, 2023
  
  7dec65a2
30 Mar, 2023 2 commits
- v0.4.3 (#152) · fef1a1c3
  OlivierDehaene authored Mar 30, 2023
  
  fef1a1c3
- feat(benchmark): tui based benchmarking tool (#149) · 610bb1f9
  OlivierDehaene authored Mar 30, 2023
  
  610bb1f9
29 Mar, 2023 1 commit

feat: aws sagemaker compatible image (#147) · d503e8f0

OlivierDehaene authored Mar 29, 2023



The only difference is that now it pushes to
registry.internal.huggingface.tech/api-inference/community/text-generation-inference/sagemaker:...
instead of
registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sagemaker-...

---------
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

d503e8f0

28 Mar, 2023 1 commit
- feat(server): clear cache on error (#143) · f0000689
  OlivierDehaene authored Mar 28, 2023
  
  f0000689
16 Mar, 2023 1 commit
- fix(server): use server tokenizer as gt (#128) · b49dbf2d
  OlivierDehaene authored Mar 16, 2023
  
  b49dbf2d
13 Mar, 2023 1 commit
- fix(server): revert gpt-neox optims (#123) · cbd36aa4
  OlivierDehaene authored Mar 13, 2023
  
  cbd36aa4
09 Mar, 2023 3 commits
- feat(router): add best_of parameter (#117) · 55bd4fed
  OlivierDehaene authored Mar 09, 2023
  
  55bd4fed
- feat(router): support left truncation (#115) · e8bfe199
  OlivierDehaene authored Mar 09, 2023
```
closes #111
```
  e8bfe199
- feat: support typical sampling (#114) · 1a2d6825
  OlivierDehaene authored Mar 09, 2023
```
closes #112
```
  1a2d6825
07 Mar, 2023 1 commit
- feat(clients): Python client (#103) · 3fef90d5
  OlivierDehaene authored Mar 07, 2023
  
  3fef90d5
06 Mar, 2023 1 commit
- feat: allow local models (#101) · cd5961b5
  OlivierDehaene authored Mar 06, 2023
```
closes #99
```
  cd5961b5
02 Mar, 2023 2 commits
- feat(server): add logits watermark (#90) · 9b8ea6a6
  OlivierDehaene authored Mar 02, 2023
  
  9b8ea6a6
- feat(router): add api-inference headers (#91) · f874c478
  OlivierDehaene authored Mar 02, 2023
  
  f874c478
28 Feb, 2023 1 commit
- feat(router): ask hf.co for pipelinetag to decide on compat_return_full_text (#89) · 4e685d90
  OlivierDehaene authored Feb 28, 2023
  
  4e685d90
27 Feb, 2023 1 commit
- feat(router): add legacy route for api-inference support (#88) · 21340f24
  OlivierDehaene authored Feb 27, 2023
  
  21340f24
24 Feb, 2023 1 commit
- feat(server): add special token bool (#85) · 0ac184ce
  OlivierDehaene authored Feb 24, 2023
  
  0ac184ce
17 Feb, 2023 1 commit
- feat(router): add cors allow origin options (#73) · 6796d38c
  OlivierDehaene authored Feb 17, 2023
  
  6796d38c
16 Feb, 2023 1 commit
- feat(router): add prometheus metrics scrape endpoint (#71) · 439fcaf8
  OlivierDehaene authored Feb 16, 2023
  
  439fcaf8
15 Feb, 2023 1 commit
- feat(router): add max_total_tokens and empty_input validation (#68) · 5437d49b
  OlivierDehaene authored Feb 15, 2023
```
closes #65
```
  5437d49b
13 Feb, 2023 1 commit
- feat: add distributed tracing (#62) · 9af45414
  OlivierDehaene authored Feb 13, 2023
  
  9af45414
08 Feb, 2023 1 commit
- fixed SSE naming (#61) · e520d5b3
  Yannic Kilcher authored Feb 08, 2023
```
https://en.wikipedia.org/wiki/Server-sent_events
```
  e520d5b3
03 Feb, 2023 1 commit
- feat(router): refactor API and add openAPI schemas (#53) · 20c3c594
  OlivierDehaene authored Feb 03, 2023
  
  20c3c594
02 Feb, 2023 2 commits

breaking(router): modify /generate API to only return generated text (#50) · b1482d90

OlivierDehaene authored Feb 02, 2023

@njhill, @yk FYI

generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.

We also remove the unused Vec.

b1482d90

feat(router): use background task to manage request queue (#52) · 7b870e1e
OlivierDehaene authored Feb 02, 2023
```
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
```
7b870e1e

01 Feb, 2023 1 commit
- feat(server): support repetition penalty (#47) · 313194f6
  OlivierDehaene authored Feb 01, 2023
  
  313194f6
31 Jan, 2023 4 commits
- feat: Add token streaming using ServerSideEvents support (#41) · 017a2a8c
  OlivierDehaene authored Jan 31, 2023
  
  017a2a8c
- fix(server): fix seeding with multiple shards (#44) · 54fec931
  OlivierDehaene authored Jan 31, 2023
  
  54fec931
- Revert "feat: Add token streaming using ServerSideEvents support" (#40) · 4f9ac67c
  OlivierDehaene authored Jan 31, 2023
```
Reverts huggingface/text-generation-inference#36
```
  4f9ac67c
- feat: Add token streaming using ServerSideEvents support (#36) · 7fbfbb0d
  OlivierDehaene authored Jan 31, 2023
```
Add token streaming using ServerSideEvents (SSE).

The signature of the SSE events is: 

```rust
  struct Details {
      finish_reason: String,
      generated_tokens: u32,
      seed: Option<u64>,
  }
  
  struct StreamResponse {
      token: Token,
      generated_text: Option<String>,
      details: Option<Details>,
  }
  
  struct ErrorResponse {
      error: String,
  }
```
```
  7fbfbb0d
30 Jan, 2023 1 commit
- feat: Support sampling seeding (#37) · cd298bc5
  OlivierDehaene authored Jan 30, 2023
```
Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
```
  cd298bc5
26 Jan, 2023 1 commit
- feat(router): Remove second lock from batcher hot path (#27) · 1539d3cb
  OlivierDehaene authored Jan 26, 2023
```
@njhill
```
  1539d3cb
23 Jan, 2023 2 commits
- fix(router): fix api-inference deployment (#31) · 5c01e254
  OlivierDehaene authored Jan 23, 2023
  
  5c01e254
- feat(docker): Make the image compatible with api-inference (#29) · f9d0ec37
  OlivierDehaene authored Jan 23, 2023
  
  f9d0ec37
20 Jan, 2023 1 commit
- feat(server): Support SantaCoder (#26) · 15511edc
  OlivierDehaene authored Jan 20, 2023
  
  15511edc
17 Jan, 2023 2 commits
- fix(router): Obey max batch size (#23) · f7ac3949
  Nick Hill authored Jan 17, 2023
  
  f7ac3949
- fix(server): Minor refactorization using new_zeros (#24) · e6d3eb5d
  Nick Hill authored Jan 17, 2023
```
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
```
  e6d3eb5d
03 Jan, 2023 1 commit
- feat(router): Add const parameters to validation logic (#15) · 60472f9d
  Nick Hill authored Jan 03, 2023
```
I noticed some opportunity to collapse some of the logic, in case you
are interested.
```
  60472f9d
30 Dec, 2022 1 commit

fix(router): Include special tokens when tokenizing (#14) · 3efa5bbb

Nick Hill authored Dec 30, 2022

There's currently a discrepancy in the tokenization between the router
and python server code. The latter includes special tokens but former
does not.

This results in a token count mismatch for seq2seq models such as mt0
where the tokenizer emits an EOS token at the end.

This in turn results in some unexpected/incorrect output, in particular
when batch concatenation is involved, because the python code uses the
input length passed from the router for each row.

As far as I can tell, it is better to include this token in the encoder
`input_ids`, so I guess it's best to just adjust on the router side.

3efa5bbb