"vscode:/vscode.git/clone" did not exist on "ce2ae984a19d6c52c1ee3d799834f42d4d38490c"
- 23 Jun, 2023 3 commits
-
-
OlivierDehaene authored
-
Nicolas Patry authored
-
Nicolas Patry authored
Should be more robust to shared tensors (ok when using `from_pretrained). But forcing us to add new checks in our loading code (since the chosen key to keep might be different from `transformers`). --------- Co-authored-by:Ubuntu <ubuntu@ip-172-31-41-161.ec2.internal>
-
- 20 Jun, 2023 2 commits
-
-
Nicolas Patry authored
-
OlivierDehaene authored
Closes #471
-
- 19 Jun, 2023 1 commit
-
-
OlivierDehaene authored
@lewtun, is this enough? Closes #458 Closes #456
-
- 16 Jun, 2023 1 commit
-
-
OlivierDehaene authored
-
- 12 Jun, 2023 3 commits
-
-
OlivierDehaene authored
-
sayf eddine hammemi authored
# What does this PR do? This PR fixes: - The usage of non posix comparison which may fail depending on the shell used (`=` will always work, `==` only with bash) - Typo in the env variable name displayed in the error message `BUILD_EXTENSION` instead of `BUILD_EXTENSIONS` <!-- Remove if not applicable --> Fixes #422
-
A.J authored
# What does this PR do? It solves a typo in the comment sections referencing the environment variable `CUDA_VISIBLE_DEVICES`. No misspelling references to this variable have been found in code logic leading to undefined behaviour or bugs. This PR is not expected to perform any code logic modification.
-
- 09 Jun, 2023 1 commit
-
-
OlivierDehaene authored
-
- 08 Jun, 2023 1 commit
-
-
Nicolas Patry authored
# What does this PR do? Reworked the loading logic. Idea is to use cleaner loading code: - Remove need for `no_init_weights` - Remove all weird `bnb_linear` and `load_weights` and `post_load_weights`. New code layout: - New class `Weights` in charge of handling loading the weights from multiple files into appropiate tensors (potentially sharded) - TP layers now are "shells", they contain the code to know what kind of sharding we need + eventual `all_reduce`. They do not inherit from linear, but they contain some kind of Linear instead - the contained linear can be either FastLinear, BnbLinear or GPTq Linear next. - All modeling code is explictly made for sharding, process group is just no-ops for non sharded code (removes a lot of test cases)  --------- Co-authored-by:
Ubuntu <ubuntu@ip-172-31-41-161.taildb5d.ts.net> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-41-161.ec2.internal> Co-authored-by:
OlivierDehaene <olivier@huggingface.co> Co-authored-by:
OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
-
- 05 Jun, 2023 2 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 02 Jun, 2023 3 commits
-
-
OlivierDehaene authored
Close #288
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 01 Jun, 2023 4 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
Fix #366
-
OlivierDehaene authored
-
OlivierDehaene authored
Fix #389
-
- 31 May, 2023 4 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 30 May, 2023 5 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
OlivierDehaene authored
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 26 May, 2023 2 commits
-
-
CL-Shang authored
-
OlivierDehaene authored
Co-authored-by:Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
-
- 25 May, 2023 1 commit
-
-
OlivierDehaene authored
-
- 24 May, 2023 1 commit
-
-
OlivierDehaene authored
Closes #307 #308
-
- 23 May, 2023 6 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
@njhill FYI
-
OlivierDehaene authored
-
OlivierDehaene authored
-
OlivierDehaene authored
Fixes #349
-
OlivierDehaene authored
Fixes #338
-