Commits · 757223b352896a9b2b9df46c95c0afcaa0cdf9d4 · OpenDAS / text-generation-inference

04 Jun, 2024 1 commit

feat: add SchedulerV3 (#1996) · 757223b3

OlivierDehaene authored Jun 04, 2024

- Refactor code to allow supporting multiple versions of the
generate.proto at the same time
- Add v3/generate.proto (ISO to generate.proto for now but allow for
future changes without impacting v2 backends)
- Add Schedule trait to abstract queuing and batching mechanisms that
will be different in the future
- Add SchedulerV2/V3 impl

757223b3

03 Jun, 2024 1 commit

router: send the input as chunks to the backend · df71aafd

Daniël de Kok authored Jun 03, 2024

Before this change, the generation input was sent to the backend as a
single string, encoding images as Base64 and packing them in
Markdown-style links.

This change adds a new chunked input representation that separates text
chunks from images chunks. Image chunks contain binary data (for smaller
message sizes) and the image's MIME type.

The stringly-typed inputs are still sent to support backends that do not
support chunked inputs yet.

df71aafd

15 Feb, 2024 1 commit

Outlines guided generation (#1539) · cef0553d

drbh authored Feb 15, 2024

This WIP PR starts to add grammar support via outlines, currently this
PR supports very simple regex grammars and does not optimize for
precompiling or caching grammar fsm's.

todo:
- [X] add simple outlines guidance to `NextTokenChooser`
- [X] update protos for grammar
- [X] update generation params API
- [X] constrain simple grammar
- [ ] support parsing more complex grammar into fsm
- [ ] support all outline support grammar types
- [ ] explore optimizations to avoid recompiling grammars

guided request
```bash
curl -s 'http://localhost:3000/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "inputs": "make an email for david: \n",
    "parameters": {
        "max_new_tokens": 6,
        "grammar": "[\\w-]+@([\\w-]+\\.)+[\\w-]+"
    }
}' | jq
```
response
```json
{
  "generated_text": "david@example.com"
}
```

unguided request
```bash
curl -s 'http://localhost:3000/generate' \
--header 'Content-Type: application/json' \
--data '{
    "inputs": "make an email for david: \n",
    "parameters": {
        "max_new_tokens": 6
    }
}' | jq
```
response
```json
{
  "generated_text": "    email = 'david"
}
```

cef0553d

11 Dec, 2023 1 commit
- Speculative (#1308) · 9ecfa16b
  Nicolas Patry authored Dec 11, 2023
  
  9ecfa16b
24 May, 2023 1 commit
- feat: decrease IPC proto size (#367) · 218c9ada
  OlivierDehaene authored May 24, 2023
```
Closes #307 #308
```
  218c9ada
10 May, 2023 1 commit
- feat(server): shard token decode (#303) · 68e9d6ab
  OlivierDehaene authored May 10, 2023
  
  68e9d6ab
26 Apr, 2023 1 commit

feat(router): new healthcheck that skips the queue (#244) · db2b4e07

Nicolas Patry authored Apr 26, 2023


Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>

db2b4e07

21 Apr, 2023 1 commit
- feat(router): add device and dtype info (#215) · 343437c7
  OlivierDehaene authored Apr 21, 2023
  
  343437c7
13 Feb, 2023 1 commit
- feat: add distributed tracing (#62) · 9af45414
  OlivierDehaene authored Feb 13, 2023
  
  9af45414
03 Feb, 2023 1 commit
- feat(router): refactor API and add openAPI schemas (#53) · 20c3c594
  OlivierDehaene authored Feb 03, 2023
  
  20c3c594
31 Jan, 2023 3 commits

feat: Add token streaming using ServerSideEvents support (#41) · 017a2a8c
OlivierDehaene authored Jan 31, 2023

017a2a8c
Revert "feat: Add token streaming using ServerSideEvents support" (#40) · 4f9ac67c
OlivierDehaene authored Jan 31, 2023
```
Reverts huggingface/text-generation-inference#36
```
4f9ac67c

feat: Add token streaming using ServerSideEvents support (#36) · 7fbfbb0d

OlivierDehaene authored Jan 31, 2023

Add token streaming using ServerSideEvents (SSE).

The signature of the SSE events is: 

```rust
struct Details {
    finish_reason: String,
    generated_tokens: u32,
    seed: Option<u64>,
}

struct StreamResponse {
    token: Token,
    generated_text: Option<String>,
    details: Option<Details>,
}

struct ErrorResponse {
    error: String,
}
```

7fbfbb0d

15 Dec, 2022 1 commit
- feat: Return logprobs (#8) · 32a25306
  OlivierDehaene authored Dec 15, 2022
  
  32a25306
12 Dec, 2022 1 commit
- feat: Support stop sequences (#7) · 718096f6
  OlivierDehaene authored Dec 12, 2022
  
  718096f6
20 Oct, 2022 1 commit
- v0.1.0 · f16f2f5a
  Olivier Dehaene authored Oct 18, 2022
  
  f16f2f5a
17 Oct, 2022 1 commit
- feat: Improve error handling · 5e5d8766
  Olivier Dehaene authored Oct 17, 2022
  
  5e5d8766
11 Oct, 2022 1 commit
- Refactored gRPC interface · 4c693e65
  Olivier Dehaene authored Oct 11, 2022
```
Added validation logic
```
  4c693e65
08 Oct, 2022 1 commit
- Init · 295831a4
  Olivier Dehaene authored Oct 08, 2022
  
  295831a4