- 07 Jun, 2024 1 commit
-
-
Daniël de Kok authored
The router will now send the input as chunks besides as a single string. This change modifies the server to process chunked input rather than strings. This also allows us to remove the image extraction code from the server.
-
- 14 Dec, 2023 1 commit
-
-
OlivierDehaene authored
-
- 11 Dec, 2023 2 commits
-
-
OlivierDehaene authored
-
Nicolas Patry authored
-
- 08 Jun, 2023 1 commit
-
-
Nicolas Patry authored
# What does this PR do? Reworked the loading logic. Idea is to use cleaner loading code: - Remove need for `no_init_weights` - Remove all weird `bnb_linear` and `load_weights` and `post_load_weights`. New code layout: - New class `Weights` in charge of handling loading the weights from multiple files into appropiate tensors (potentially sharded) - TP layers now are "shells", they contain the code to know what kind of sharding we need + eventual `all_reduce`. They do not inherit from linear, but they contain some kind of Linear instead - the contained linear can be either FastLinear, BnbLinear or GPTq Linear next. - All modeling code is explictly made for sharding, process group is just no-ops for non sharded code (removes a lot of test cases)  --------- Co-authored-by:
Ubuntu <ubuntu@ip-172-31-41-161.taildb5d.ts.net> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-41-161.ec2.internal> Co-authored-by:
OlivierDehaene <olivier@huggingface.co> Co-authored-by:
OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
-
- 02 Jun, 2023 1 commit
-
-
OlivierDehaene authored
Close #288
-
- 26 May, 2023 1 commit
-
-
OlivierDehaene authored
Co-authored-by:Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
-
- 24 May, 2023 1 commit
-
-
OlivierDehaene authored
Closes #307 #308
-
- 16 May, 2023 1 commit
-
-
OlivierDehaene authored
Fixes #333 --------- Co-authored-by:Nicolas Patry <patry.nicolas@protonmail.com>
-
- 24 Apr, 2023 2 commits
-
-
OlivierDehaene authored
Co-authored-by:Nick Hill <nickhill@us.ibm.com>
-
Nick Hill authored
-
- 20 Apr, 2023 1 commit
-
-
OlivierDehaene authored
-
- 11 Apr, 2023 1 commit
-
-
OlivierDehaene authored
-
- 09 Apr, 2023 1 commit
-
-
OlivierDehaene authored
-
- 16 Mar, 2023 1 commit
-
-
OlivierDehaene authored
-
- 07 Mar, 2023 1 commit
-
-
OlivierDehaene authored
-
- 06 Mar, 2023 1 commit
-
-
OlivierDehaene authored
-
- 24 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 03 Feb, 2023 1 commit
-
-
OlivierDehaene authored
-
- 02 Feb, 2023 1 commit
-
-
OlivierDehaene authored
@njhill, @yk FYI generated_text was concatenated to the user prompt for legacy reason. We want to remove this behaviour as we don't think it is useful and even detrimonial to usability. We also remove the unused Vec.
-
- 31 Jan, 2023 3 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
Reverts huggingface/text-generation-inference#36
-
OlivierDehaene authored
Add token streaming using ServerSideEvents (SSE). The signature of the SSE events is: ```rust struct Details { finish_reason: String, generated_tokens: u32, seed: Option<u64>, } struct StreamResponse { token: Token, generated_text: Option<String>, details: Option<Details>, } struct ErrorResponse { error: String, } ```
-
- 20 Jan, 2023 2 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 15 Dec, 2022 1 commit
-
-
OlivierDehaene authored
-
- 12 Dec, 2022 1 commit
-
-
OlivierDehaene authored
-
- 08 Dec, 2022 1 commit
-
-
OlivierDehaene authored
-