- 09 Mar, 2023 1 commit
-
-
OlivierDehaene authored
closes #112
-
- 07 Mar, 2023 1 commit
-
-
OlivierDehaene authored
-
- 06 Mar, 2023 1 commit
-
-
OlivierDehaene authored
closes #99
-
- 03 Mar, 2023 1 commit
-
-
OlivierDehaene authored
-
- 02 Mar, 2023 2 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 28 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 27 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 24 Feb, 2023 2 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 17 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 16 Feb, 2023 2 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 15 Feb, 2023 1 commit
-
-
OlivierDehaene authored
closes #65
-
- 13 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 08 Feb, 2023 1 commit
-
-
- 07 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 03 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 02 Feb, 2023 2 commits
-
-
OlivierDehaene authored
@njhill, @yk FYI generated_text was concatenated to the user prompt for legacy reason. We want to remove this behaviour as we don't think it is useful and even detrimonial to usability. We also remove the unused Vec.
-
OlivierDehaene authored
Co-authored-by:Nick Hill <nickhill@us.ibm.com>
-
- 01 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 31 Jan, 2023 4 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
OlivierDehaene authored
Reverts huggingface/text-generation-inference#36
-
OlivierDehaene authored
Add token streaming using ServerSideEvents (SSE). The signature of the SSE events is: ```rust struct Details { finish_reason: String, generated_tokens: u32, seed: Option<u64>, } struct StreamResponse { token: Token, generated_text: Option<String>, details: Option<Details>, } struct ErrorResponse { error: String, } ```
-
- 30 Jan, 2023 1 commit
-
-
OlivierDehaene authored
Co-authored-by:Yannic Kilcher <yk@users.noreply.github.com>
-
- 26 Jan, 2023 1 commit
-
-
OlivierDehaene authored
@njhill
-
- 23 Jan, 2023 2 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 20 Jan, 2023 1 commit
-
-
OlivierDehaene authored
-
- 17 Jan, 2023 2 commits
- 03 Jan, 2023 1 commit
-
-
Nick Hill authored
I noticed some opportunity to collapse some of the logic, in case you are interested.
-
- 30 Dec, 2022 1 commit
-
-
Nick Hill authored
There's currently a discrepancy in the tokenization between the router and python server code. The latter includes special tokens but former does not. This results in a token count mismatch for seq2seq models such as mt0 where the tokenizer emits an EOS token at the end. This in turn results in some unexpected/incorrect output, in particular when batch concatenation is involved, because the python code uses the input length passed from the router for each row. As far as I can tell, it is better to include this token in the encoder `input_ids`, so I guess it's best to just adjust on the router side.
-
- 15 Dec, 2022 1 commit
-
-
OlivierDehaene authored
-
- 12 Dec, 2022 1 commit
-
-
OlivierDehaene authored
-
- 08 Dec, 2022 1 commit
-
-
OlivierDehaene authored
-
- 05 Dec, 2022 1 commit
-
-
Nick Hill authored
- Avoid theoretical hang in batcher loop - Avoid a couple of clones in the router generate method - Keep attention mask tensors as integers - Remove num_heads attribute Co-authored-by:OlivierDehaene <Olivier.dehaene@gmail.com>
-
- 14 Nov, 2022 2 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-