1. 08 Jun, 2023 1 commit
    • Nicolas Patry's avatar
      feat(server): Rework model loading (#344) · abd58ff8
      Nicolas Patry authored
      # What does this PR do?
      
      Reworked the loading logic. Idea is to use cleaner loading code:
      
      - Remove need for `no_init_weights`
      - Remove all weird `bnb_linear` and `load_weights` and
      `post_load_weights`.
      
      New code layout:
      
      - New class `Weights` in charge of handling loading the weights from
      multiple files into appropiate tensors (potentially sharded)
      - TP layers now are "shells", they contain the code to know what kind of
      sharding we need + eventual `all_reduce`. They do not inherit from
      linear, but they contain some kind of Linear instead
      - the contained linear can be either FastLinear, BnbLinear or GPTq
      Linear next.
      - All modeling code is explictly made for sharding, process group is
      just no-ops for non sharded code (removes a lot of test cases)
      
      ![Screenshot from 2023-05-19
      23-19-59](https://github.com/huggingface/text-generation-inference/assets/204321/9a802654-74a3-488c-87a8-073743a6143f)
      
      ---------
      
      Co-authored-by: Ubuntu <ubuntu@ip-1...
      abd58ff8
  2. 12 May, 2023 3 commits
  3. 10 May, 2023 2 commits
  4. 09 May, 2023 4 commits
  5. 02 May, 2023 1 commit
  6. 27 Apr, 2023 1 commit
  7. 24 Apr, 2023 1 commit
  8. 21 Apr, 2023 1 commit
  9. 19 Apr, 2023 3 commits
  10. 16 Apr, 2023 1 commit
  11. 14 Apr, 2023 4 commits
  12. 09 Apr, 2023 1 commit
  13. 29 Mar, 2023 1 commit
  14. 24 Mar, 2023 1 commit
  15. 03 Mar, 2023 1 commit
  16. 18 Feb, 2023 1 commit
  17. 13 Feb, 2023 1 commit
  18. 08 Feb, 2023 1 commit
  19. 03 Feb, 2023 1 commit
  20. 24 Jan, 2023 1 commit
  21. 23 Jan, 2023 2 commits
  22. 14 Nov, 2022 1 commit
  23. 08 Nov, 2022 1 commit
  24. 07 Nov, 2022 1 commit
  25. 02 Nov, 2022 1 commit
  26. 28 Oct, 2022 1 commit
  27. 27 Oct, 2022 1 commit
  28. 22 Oct, 2022 1 commit