1. 12 Jun, 2023 2 commits
    • sayf eddine hammemi's avatar
      fix(makefile): Fix typo and use POSIX comparison in the makefile (#443) · ca650e5b
      sayf eddine hammemi authored
      # What does this PR do?
      
      This PR fixes:
      - The usage of non posix comparison which may fail depending on the
      shell used (`=` will always work, `==` only with bash)
      - Typo in the env variable name displayed in the error message
      `BUILD_EXTENSION` instead of `BUILD_EXTENSIONS`
      
      <!-- Remove if not applicable -->
      
      Fixes #422 
      ca650e5b
    • A.J's avatar
      docs(launcher): fix CUDA_VISIBLE_DEVICES helper comment (#441) · d4eb60f4
      A.J authored
      # What does this PR do?
      It solves a typo in the comment sections referencing the environment
      variable `CUDA_VISIBLE_DEVICES`. No misspelling references to this
      variable have been found in code logic leading to undefined behaviour or
      bugs. This PR is not expected to perform any code logic modification.
      d4eb60f4
  2. 09 Jun, 2023 1 commit
  3. 08 Jun, 2023 1 commit
    • Nicolas Patry's avatar
      feat(server): Rework model loading (#344) · abd58ff8
      Nicolas Patry authored
      # What does this PR do?
      
      Reworked the loading logic. Idea is to use cleaner loading code:
      
      - Remove need for `no_init_weights`
      - Remove all weird `bnb_linear` and `load_weights` and
      `post_load_weights`.
      
      New code layout:
      
      - New class `Weights` in charge of handling loading the weights from
      multiple files into appropiate tensors (potentially sharded)
      - TP layers now are "shells", they contain the code to know what kind of
      sharding we need + eventual `all_reduce`. They do not inherit from
      linear, but they contain some kind of Linear instead
      - the contained linear can be either FastLinear, BnbLinear or GPTq
      Linear next.
      - All modeling code is explictly made for sharding, process group is
      just no-ops for non sharded code (removes a lot of test cases)
      
      ![Screenshot from 2023-05-19
      23-19-59](https://github.com/huggingface/text-generation-inference/assets/204321/9a802654-74a3-488c-87a8-073743a6143f)
      
      ---------
      
      Co-authored-by: Ubuntu <ubuntu@ip-1...
      abd58ff8
  4. 05 Jun, 2023 2 commits
  5. 02 Jun, 2023 3 commits
  6. 01 Jun, 2023 4 commits
  7. 31 May, 2023 4 commits
  8. 30 May, 2023 5 commits
  9. 26 May, 2023 2 commits
  10. 25 May, 2023 1 commit
  11. 24 May, 2023 1 commit
  12. 23 May, 2023 9 commits
  13. 22 May, 2023 3 commits
  14. 16 May, 2023 2 commits