1. 04 Feb, 2021 2 commits
  2. 03 Feb, 2021 1 commit
  3. 02 Feb, 2021 1 commit
  4. 29 Jan, 2021 1 commit
  5. 28 Jan, 2021 4 commits
  6. 27 Jan, 2021 2 commits
  7. 26 Jan, 2021 1 commit
    • Sylvain Gugger's avatar
      Smdistributed trainer (#9798) · 0d0efd3a
      Sylvain Gugger authored
      * Add a debug print
      
      * Adapt Trainer to use smdistributed if available
      
      * Forgotten parenthesis
      
      * Real check for sagemaker
      
      * Donforget to define device...
      
      * Woopsie, local)rank is defined differently
      
      * Update since local_rank has the proper value
      
      * Remove debug statement
      
      * More robust check for smdistributed
      
      * Quality
      
      * Deal with key not present error
      0d0efd3a
  8. 25 Jan, 2021 2 commits
  9. 22 Jan, 2021 1 commit
  10. 21 Jan, 2021 2 commits
  11. 20 Jan, 2021 1 commit
  12. 15 Jan, 2021 1 commit
  13. 14 Jan, 2021 1 commit
  14. 13 Jan, 2021 1 commit
    • Stas Bekman's avatar
      [trainer] deepspeed integration (#9211) · 2df34f4a
      Stas Bekman authored
      
      
      * deepspeed integration
      
      * style
      
      * add test
      
      * ds wants to do its own backward
      
      * fp16 assert
      
      * Update src/transformers/training_args.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * style
      
      * for clarity extract what args are being passed to deepspeed
      
      * introduce the concept of self.wrapped_model
      
      * s/self.wrapped_model/self.model_wrapped/
      
      * complete transition to self.wrapped_model / self.model
      
      * fix
      
      * doc
      
      * give ds its own init
      
      * add custom overrides, handle bs correctly
      
      * fix test
      
      * clean up model_init logic, fix small bug
      
      * complete fix
      
      * collapse --deepspeed_config into --deepspeed
      
      * style
      
      * start adding doc notes
      
      * style
      
      * implement hf2ds optimizer and scheduler configuration remapping
      
      * oops
      
      * call get_num_training_steps absolutely when needed
      
      * workaround broken auto-formatter
      
      * deepspeed_config arg is no longer needed - fixed in deepspeed master
      
      * use hf's fp16 args in config
      
      * clean
      
      * start on the docs
      
      * rebase cleanup
      
      * finish up --fp16
      
      * clarify the supported stages
      
      * big refactor thanks to discovering deepspeed.init_distributed
      
      * cleanup
      
      * revert fp16 part
      
      * add checkpoint-support
      
      * more init ds into integrations
      
      * extend docs
      
      * cleanup
      
      * unfix docs
      
      * clean up old code
      
      * imports
      
      * move docs
      
      * fix logic
      
      * make it clear which file it's referring to
      
      * document nodes/gpus
      
      * style
      
      * wrong format
      
      * style
      
      * deepspeed handles gradient clipping
      
      * easier to read
      
      * major doc rewrite
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * docs
      
      * switch to AdamW optimizer
      
      * style
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * clarify doc
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      2df34f4a
  15. 11 Jan, 2021 2 commits
  16. 06 Jan, 2021 2 commits
  17. 05 Jan, 2021 2 commits
    • Stas Bekman's avatar
      [trainer] --model_parallel hasn't been implemented for most models (#9347) · 748006c0
      Stas Bekman authored
      * --model_parallel hasn't been implemented for most models
      
      * make the help clear as well
      
      * implement is_parallelizable; use it
      
      * oops
      
      * remove property
      748006c0
    • Boris Dayma's avatar
      feat(wandb): save model as artifact (#8119) · 30fa0b78
      Boris Dayma authored
      * feat(wandb): log artifacts
      
      * fix: typo
      
      * feat(wandb): ensure name is allowed
      
      * feat(wandb): log artifact
      
      * feat(wandb): saving logic
      
      * style: improve formatting
      
      * fix: unrelated typo
      
      * feat:聽use a fake trainer
      
      * fix:聽simplify
      
      * feat(wandb): log model files as artifact
      
      * style: fix style
      
      * docs(wandb): correct description
      
      * feat: unpack model + allow env Truethy values
      
      * feat: TrainerCallback can access tokenizer
      
      * style:聽fix style
      
      * feat(wandb): log more interesting metadata
      
      * feat: unpack tokenizer
      
      * feat(wandb): metadata with load_best_model_at_end
      
      * feat(wandb): more robust metadata
      
      * style(wandb): fix formatting
      30fa0b78
  18. 04 Jan, 2021 1 commit
  19. 22 Dec, 2020 1 commit
  20. 21 Dec, 2020 1 commit
  21. 18 Dec, 2020 2 commits
  22. 17 Dec, 2020 1 commit
  23. 16 Dec, 2020 1 commit
  24. 15 Dec, 2020 2 commits
  25. 09 Dec, 2020 1 commit
  26. 02 Dec, 2020 1 commit
    • Stas Bekman's avatar
      [trainer] improve code readability (#8903) · 7e1cb00c
      Stas Bekman authored
      * [trainer] improve code
      
      This PR:
      - removes redundant code 
      ```
      self.model = model if model is not None else None
      ```
      and
      ```
      self.model = model
      ```
      are the same.
      
      * separate attribute assignment from code logic - which simplifies things further.
      
      * whitespace
      7e1cb00c
  27. 01 Dec, 2020 1 commit
  28. 30 Nov, 2020 1 commit
    • Shai Erera's avatar
      Use model.from_pretrained for DataParallel also (#8795) · 77384941
      Shai Erera authored
      * Use model.from_pretrained for DataParallel also
      
      When training on multiple GPUs, the code wraps a model with torch.nn.DataParallel. However if the model has custom from_pretrained logic, it does not get applied during load_best_model_at_end.
      
      This commit uses the underlying model during load_best_model_at_end, and re-wraps the loaded model with DataParallel.
      
      If you choose to reject this change, then could you please move the this logic to a function, e.g. def load_best_model_checkpoint(best_model_checkpoint) or something, so that it can be overridden?
      
      * Fix silly bug
      
      * Address review comments
      
      Thanks for the feedback. I made the change that you proposed, but I also think we should update L811 to check if `self.mode` is an instance of `PreTrained`, otherwise we would still not get into that `if` section, right?
      77384941