Commits · 4acc42a6050ed6c9f9c7c9db28a0f244a45990e7 · OpenDAS / text-generation-inference

07 Feb, 2023 1 commit
- fix(server): better handling of inference mode (#57) · 4acc42a6
  OlivierDehaene authored Feb 07, 2023
  
  4acc42a6
03 Feb, 2023 1 commit
- feat(router): refactor API and add openAPI schemas (#53) · 20c3c594
  OlivierDehaene authored Feb 03, 2023
  
  20c3c594
02 Feb, 2023 2 commits
- breaking(router): modify /generate API to only return generated text (#50) · b1482d90
  OlivierDehaene authored Feb 02, 2023
```
@njhill, @yk FYI

generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.

We also remove the unused Vec.
```
  b1482d90
- fix(server): allow greedy repetition penalty (#51) · df227ac2
  OlivierDehaene authored Feb 02, 2023
  
  df227ac2
01 Feb, 2023 3 commits
- feat(server): allow the server to use a local weight cache (#49) · 775115e3
  OlivierDehaene authored Feb 01, 2023
  
  775115e3
- feat(server): support repetition penalty (#47) · 313194f6
  OlivierDehaene authored Feb 01, 2023
  
  313194f6
- feat(server): allow gpt-neox models with odd vocab sizes to be sharded (#48) · 2ad895a6
  OlivierDehaene authored Feb 01, 2023
  
  2ad895a6
31 Jan, 2023 7 commits
- feat(server): Support GPT-Neox (#39) · f830706b
  OlivierDehaene authored Jan 31, 2023
  
  f830706b
- fix(server): fix quantization for sharded models (#45) · c6e8b944
  OlivierDehaene authored Jan 31, 2023
  
  c6e8b944
- feat: Add token streaming using ServerSideEvents support (#41) · 017a2a8c
  OlivierDehaene authored Jan 31, 2023
  
  017a2a8c
- fix(server): fix seeding with multiple shards (#44) · 54fec931
  OlivierDehaene authored Jan 31, 2023
  
  54fec931
- fix(server): fix seeding on gpu (#42) · 03bdf182
  OlivierDehaene authored Jan 31, 2023
  
  03bdf182
- Revert "feat: Add token streaming using ServerSideEvents support" (#40) · 4f9ac67c
  OlivierDehaene authored Jan 31, 2023
```
Reverts huggingface/text-generation-inference#36
```
  4f9ac67c
- feat: Add token streaming using ServerSideEvents support (#36) · 7fbfbb0d
  OlivierDehaene authored Jan 31, 2023
```
Add token streaming using ServerSideEvents (SSE).

The signature of the SSE events is: 

```rust
  struct Details {
      finish_reason: String,
      generated_tokens: u32,
      seed: Option<u64>,
  }
  
  struct StreamResponse {
      token: Token,
      generated_text: Option<String>,
      details: Option<Details>,
  }
  
  struct ErrorResponse {
      error: String,
  }
```
```
  7fbfbb0d
30 Jan, 2023 1 commit
- feat: Support sampling seeding (#37) · cd298bc5
  OlivierDehaene authored Jan 30, 2023
```
Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
```
  cd298bc5
26 Jan, 2023 1 commit
- feat(bloom): use torch.nn.Linear and torch.nn.GELU (#33) · ce960be0
  OlivierDehaene authored Jan 26, 2023
  
  ce960be0
24 Jan, 2023 1 commit
- fix(dockerfile): fix docker build (#32) · 13e7044a
  OlivierDehaene authored Jan 24, 2023
  
  13e7044a
20 Jan, 2023 2 commits
- fix(server): Fix position ids (#28) · 1f570d18
  OlivierDehaene authored Jan 20, 2023
  
  1f570d18
- feat(server): Support SantaCoder (#26) · 15511edc
  OlivierDehaene authored Jan 20, 2023
  
  15511edc
17 Jan, 2023 1 commit

fix(server): Minor refactorization using new_zeros (#24) · e6d3eb5d

Nick Hill authored Jan 17, 2023

- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher

e6d3eb5d

05 Jan, 2023 1 commit
- feat(launcher): Log server stdout (#19) · fcc2c5fc
  OlivierDehaene authored Jan 05, 2023
```
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
```
  fcc2c5fc
03 Jan, 2023 1 commit
- fix(server): Use cleanup_tokenization_spaces=False for lossless decoding (#13) · b94f3021
  Nicolas Patry authored Jan 03, 2023
```
Fixes #12 in the easiest way I could think of.
```
  b94f3021
30 Dec, 2022 1 commit
- fix(server): Check for device type correctly when determining initial padding (#16) · 686cc667
  Nick Hill authored Dec 30, 2022
```
AFAIK there is no torch device type called "gpu".
```
  686cc667
16 Dec, 2022 1 commit
- fix(server): Fix stop sequences (#11) · 611e21cb
  OlivierDehaene authored Dec 16, 2022
  
  611e21cb
15 Dec, 2022 1 commit
- feat: Return logprobs (#8) · 32a25306
  OlivierDehaene authored Dec 15, 2022
  
  32a25306
12 Dec, 2022 1 commit
- feat: Support stop sequences (#7) · 718096f6
  OlivierDehaene authored Dec 12, 2022
  
  718096f6
08 Dec, 2022 2 commits
- fix(server): Only pad to multiple of 8 on GPUs · 042180d8
  OlivierDehaene authored Dec 08, 2022
  
  042180d8
- feat(server): Add model tests (#6) · a2985036
  OlivierDehaene authored Dec 08, 2022
  
  a2985036
05 Dec, 2022 1 commit

fix(batching): Avoid theoretical hang in batcher loop (#5) · 31d76e23

Nick Hill authored Dec 05, 2022



- Avoid theoretical hang in batcher loop
- Avoid a couple of clones in the router generate method
- Keep attention mask tensors as integers
- Remove num_heads attribute
Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>

31d76e23

01 Dec, 2022 1 commit
- feat(server): Support Galactica (#4) · daa1d81d
  OlivierDehaene authored Dec 01, 2022
  
  daa1d81d
09 Nov, 2022 1 commit
- feat(server): Clarify CausalLMBatch concatenate method · dccd5c2b
  OlivierDehaene authored Nov 09, 2022
  
  dccd5c2b
08 Nov, 2022 1 commit
- fix(server): Fix Transformers fork version · fa43fb71
  OlivierDehaene authored Nov 08, 2022
  
  fa43fb71
07 Nov, 2022 1 commit
- feat(server): Improved doc · 4236e41b
  OlivierDehaene authored Nov 07, 2022
  
  4236e41b
04 Nov, 2022 2 commits
- feat(server): Support AutoModelForSeq2SeqLM · 427d7cc4
  OlivierDehaene authored Nov 04, 2022
  
  427d7cc4
- feat(server): Support generic AutoModelForCausalLM · c5665f5c
  OlivierDehaene authored Nov 04, 2022
  
  c5665f5c
03 Nov, 2022 1 commit
- fix(models): Revert buggy support for AutoModel · 755fc0e4
  OlivierDehaene authored Nov 03, 2022
  
  755fc0e4
02 Nov, 2022 1 commit
- feat: Use json formatter by default in docker image · b3b7ea0d
  OlivierDehaene authored Nov 02, 2022
  
  b3b7ea0d
28 Oct, 2022 1 commit
- feat(server): Support all AutoModelForCausalLM on a best effort basis · 3cf6368c
  OlivierDehaene authored Oct 28, 2022
  
  3cf6368c
27 Oct, 2022 1 commit
- feat(server): Support bitsandbytes · 09674e6d
  OlivierDehaene authored Oct 27, 2022
  
  09674e6d
22 Oct, 2022 1 commit

feat(server): Use safetensors · c8ce9b25

Nicolas Patry authored Oct 22, 2022


Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>

c8ce9b25