1. 24 May, 2023 3 commits
    • Zachary Mueller's avatar
      Fix sagemaker DP/MP (#23681) · 75bbf20b
      Zachary Mueller authored
      * Check for use_sagemaker_dp
      
      * Add a check for is_sagemaker_mp when setting _n_gpu again. Should be last broken thing
      
      * Try explicit check?
      
      * Quality
      75bbf20b
    • Tim Dettmers's avatar
      Paged Optimizer + Lion Optimizer for Trainer (#23217) · 796162c5
      Tim Dettmers authored
      
      
      * Added lion and paged optimizers and made original tests pass.
      
      * Added tests for paged and lion optimizers.
      
      * Added and fixed optimizer tests.
      
      * Style and quality checks.
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      796162c5
    • Tim Dettmers's avatar
      4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) (#23479) · 9d73b922
      Tim Dettmers authored
      
      
      * Added lion and paged optimizers and made original tests pass.
      
      * Added tests for paged and lion optimizers.
      
      * Added and fixed optimizer tests.
      
      * Style and quality checks.
      
      * Initial draft. Some tests fail.
      
      * Fixed dtype bug.
      
      * Fixed bug caused by torch_dtype='auto'.
      
      * All test green for 8-bit and 4-bit layers.
      
      * Added fix for fp32 layer norms and bf16 compute in LLaMA.
      
      * Initial draft. Some tests fail.
      
      * Fixed dtype bug.
      
      * Fixed bug caused by torch_dtype='auto'.
      
      * All test green for 8-bit and 4-bit layers.
      
      * Added lion and paged optimizers and made original tests pass.
      
      * Added tests for paged and lion optimizers.
      
      * Added and fixed optimizer tests.
      
      * Style and quality checks.
      
      * Fixing issues for PR #23479.
      
      * Added fix for fp32 layer norms and bf16 compute in LLaMA.
      
      * Reverted variable name change.
      
      * Initial draft. Some tests fail.
      
      * Fixed dtype bug.
      
      * Fixed bug caused by torch_dtype='auto'.
      
      * All test green for 8-bit and 4-bit layers.
      
      * Added lion and paged optimizers and made original tests pass.
      
      * Added tests for paged and lion optimizers.
      
      * Added and fixed optimizer tests.
      
      * Style and quality checks.
      
      * Added missing tests.
      
      * Fixup changes.
      
      * Added fixup changes.
      
      * Missed some variables to rename.
      
      * revert trainer tests
      
      * revert test trainer
      
      * another revert
      
      * fix tests and safety checkers
      
      * protect import
      
      * simplify a bit
      
      * Update src/transformers/trainer.py
      
      * few fixes
      
      * add warning
      
      * replace with `load_in_kbit = load_in_4bit or load_in_8bit`
      
      * fix test
      
      * fix tests
      
      * this time fix tests
      
      * safety checker
      
      * add docs
      
      * revert torch_dtype
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * multiple fixes
      
      * update docs
      
      * version checks and multiple fixes
      
      * replace `is_loaded_in_kbit`
      
      * replace `load_in_kbit`
      
      * change methods names
      
      * better checks
      
      * oops
      
      * oops
      
      * address final comments
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      9d73b922
  2. 23 May, 2023 1 commit
  3. 17 May, 2023 1 commit
  4. 16 May, 2023 1 commit
  5. 09 May, 2023 1 commit
  6. 04 May, 2023 1 commit
  7. 02 May, 2023 1 commit
  8. 28 Apr, 2023 2 commits
  9. 21 Apr, 2023 1 commit
  10. 19 Apr, 2023 1 commit
  11. 17 Apr, 2023 1 commit
  12. 07 Apr, 2023 1 commit
  13. 06 Apr, 2023 2 commits
  14. 05 Apr, 2023 1 commit
    • Quentin Meeus's avatar
      Add thousands separator in training summary (#22583) · 4861c258
      Quentin Meeus authored
      The logger prints a summary at the beginning of training that displays some info such as number of examples, number of parameters, total number of steps, etc. Those numbers can be quite large and difficult to read. I added a thousand separator to improve readability for the following:
      - num_examples
      - num_train_epochs
      - per_device_train_batch_size
      - total_train_batch_size
      - max_steps
      - num_trainable_params
      4861c258
  15. 04 Apr, 2023 1 commit
  16. 03 Apr, 2023 3 commits
  17. 29 Mar, 2023 1 commit
  18. 23 Mar, 2023 2 commits
  19. 22 Mar, 2023 1 commit
  20. 21 Mar, 2023 1 commit
  21. 20 Mar, 2023 2 commits
  22. 17 Mar, 2023 1 commit
  23. 14 Mar, 2023 3 commits
  24. 13 Mar, 2023 3 commits
  25. 09 Mar, 2023 1 commit
    • anruijian's avatar
      Return analysis for hyperparameter_search with Ray backend (#22040) · 04bfac83
      anruijian authored
      * return analysis for hyperparameter_search with ray backend
      
      * Revert "return analysis for hyperparameter_search with ray backend"
      
      This reverts commit cd5179070930e03020d96d98eb51dec3eb21ef75.
      
      * add run_summary attribute to BestRun and return analysis for ray backend
      
      * fix typo
      
      * add doc for run_summary for ray backend
      04bfac83
  26. 08 Mar, 2023 1 commit
  27. 06 Mar, 2023 1 commit
  28. 02 Mar, 2023 1 commit