1. 08 Jun, 2023 1 commit
    • Nicolas Patry's avatar
      feat(server): Rework model loading (#344) · abd58ff8
      Nicolas Patry authored
      # What does this PR do?
      
      Reworked the loading logic. Idea is to use cleaner loading code:
      
      - Remove need for `no_init_weights`
      - Remove all weird `bnb_linear` and `load_weights` and
      `post_load_weights`.
      
      New code layout:
      
      - New class `Weights` in charge of handling loading the weights from
      multiple files into appropiate tensors (potentially sharded)
      - TP layers now are "shells", they contain the code to know what kind of
      sharding we need + eventual `all_reduce`. They do not inherit from
      linear, but they contain some kind of Linear instead
      - the contained linear can be either FastLinear, BnbLinear or GPTq
      Linear next.
      - All modeling code is explictly made for sharding, process group is
      just no-ops for non sharded code (removes a lot of test cases)
      
      ![Screenshot from 2023-05-19
      23-19-59](https://github.com/huggingface/text-generation-inference/assets/204321/9a802654-74a3-488c-87a8-073743a6143f)
      
      ---------
      
      Co-authored-by: Ubuntu <ubuntu@ip-1...
      abd58ff8
  2. 16 May, 2023 1 commit
  3. 20 Apr, 2023 1 commit
  4. 19 Apr, 2023 1 commit
  5. 16 Apr, 2023 1 commit
  6. 09 Apr, 2023 1 commit
  7. 27 Mar, 2023 1 commit
  8. 24 Mar, 2023 1 commit
  9. 15 Mar, 2023 1 commit
  10. 13 Mar, 2023 1 commit
  11. 07 Mar, 2023 1 commit
  12. 03 Mar, 2023 2 commits
  13. 13 Feb, 2023 1 commit
  14. 24 Jan, 2023 1 commit
  15. 08 Dec, 2022 1 commit
  16. 01 Dec, 2022 1 commit
  17. 08 Nov, 2022 1 commit
  18. 07 Nov, 2022 1 commit
  19. 03 Nov, 2022 1 commit
  20. 28 Oct, 2022 1 commit
  21. 22 Oct, 2022 1 commit
  22. 20 Oct, 2022 1 commit
  23. 08 Oct, 2022 1 commit