- 25 Apr, 2023 1 commit
-
-
Nicolas Patry authored
-
- 24 Apr, 2023 1 commit
-
-
OlivierDehaene authored
Co-authored-by:Nick Hill <nickhill@us.ibm.com>
-
- 17 Apr, 2023 1 commit
-
-
OlivierDehaene authored
closes #189
-
- 09 Apr, 2023 1 commit
-
-
OlivierDehaene authored
-
- 30 Mar, 2023 1 commit
-
-
OlivierDehaene authored
-
- 16 Mar, 2023 1 commit
-
-
OlivierDehaene authored
-
- 09 Mar, 2023 3 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
closes #111
-
OlivierDehaene authored
closes #112
-
- 07 Mar, 2023 1 commit
-
-
OlivierDehaene authored
-
- 02 Mar, 2023 1 commit
-
-
OlivierDehaene authored
-
- 16 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 15 Feb, 2023 1 commit
-
-
OlivierDehaene authored
closes #65
-
- 13 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 03 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 02 Feb, 2023 1 commit
-
-
OlivierDehaene authored
Co-authored-by:Nick Hill <nickhill@us.ibm.com>
-
- 01 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 31 Jan, 2023 4 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
OlivierDehaene authored
Reverts huggingface/text-generation-inference#36
-
OlivierDehaene authored
Add token streaming using ServerSideEvents (SSE). The signature of the SSE events is: ```rust struct Details { finish_reason: String, generated_tokens: u32, seed: Option<u64>, } struct StreamResponse { token: Token, generated_text: Option<String>, details: Option<Details>, } struct ErrorResponse { error: String, } ```
-
- 20 Jan, 2023 1 commit
-
-
OlivierDehaene authored
-
- 03 Jan, 2023 1 commit
-
-
Nick Hill authored
I noticed some opportunity to collapse some of the logic, in case you are interested.
-
- 30 Dec, 2022 1 commit
-
-
Nick Hill authored
There's currently a discrepancy in the tokenization between the router and python server code. The latter includes special tokens but former does not. This results in a token count mismatch for seq2seq models such as mt0 where the tokenizer emits an EOS token at the end. This in turn results in some unexpected/incorrect output, in particular when batch concatenation is involved, because the python code uses the input length passed from the router for each row. As far as I can tell, it is better to include this token in the encoder `input_ids`, so I guess it's best to just adjust on the router side.
-
- 12 Dec, 2022 1 commit
-
-
OlivierDehaene authored
-
- 05 Dec, 2022 1 commit
-
-
Nick Hill authored
- Avoid theoretical hang in batcher loop - Avoid a couple of clones in the router generate method - Keep attention mask tensors as integers - Remove num_heads attribute Co-authored-by:OlivierDehaene <Olivier.dehaene@gmail.com>
-
- 14 Nov, 2022 2 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 27 Oct, 2022 1 commit
-
-
OlivierDehaene authored
-
- 21 Oct, 2022 2 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 20 Oct, 2022 1 commit
-
-
Olivier Dehaene authored
-
- 17 Oct, 2022 2 commits
-
-
Olivier Dehaene authored
-
Olivier Dehaene authored
-
- 11 Oct, 2022 1 commit
-
-
Olivier Dehaene authored
Added validation logic
-