1. 19 Jun, 2023 1 commit
  2. 16 Jun, 2023 1 commit
  3. 12 Jun, 2023 3 commits
  4. 09 Jun, 2023 1 commit
  5. 08 Jun, 2023 1 commit
    • Nicolas Patry's avatar
      feat(server): Rework model loading (#344) · abd58ff8
      Nicolas Patry authored
      # What does this PR do?
      
      Reworked the loading logic. Idea is to use cleaner loading code:
      
      - Remove need for `no_init_weights`
      - Remove all weird `bnb_linear` and `load_weights` and
      `post_load_weights`.
      
      New code layout:
      
      - New class `Weights` in charge of handling loading the weights from
      multiple files into appropiate tensors (potentially sharded)
      - TP layers now are "shells", they contain the code to know what kind of
      sharding we need + eventual `all_reduce`. They do not inherit from
      linear, but they contain some kind of Linear instead
      - the contained linear can be either FastLinear, BnbLinear or GPTq
      Linear next.
      - All modeling code is explictly made for sharding, process group is
      just no-ops for non sharded code (removes a lot of test cases)
      
      ![Screenshot from 2023-05-19
      23-19-59](https://github.com/huggingface/text-generation-inference/assets/204321/9a802654-74a3-488c-87a8-073743a6143f)
      
      ---------
      
      Co-authored-by: Ubuntu <ubuntu@ip-1...
      abd58ff8
  6. 05 Jun, 2023 2 commits
  7. 02 Jun, 2023 3 commits
  8. 01 Jun, 2023 4 commits
  9. 31 May, 2023 4 commits
  10. 30 May, 2023 5 commits
  11. 26 May, 2023 2 commits
  12. 25 May, 2023 1 commit
  13. 24 May, 2023 1 commit
  14. 23 May, 2023 9 commits
  15. 22 May, 2023 2 commits