1. 13 Apr, 2023 1 commit
  2. 12 Apr, 2023 1 commit
  3. 04 Apr, 2023 1 commit
  4. 24 Mar, 2023 1 commit
  5. 20 Mar, 2023 2 commits
  6. 14 Mar, 2023 1 commit
  7. 13 Mar, 2023 1 commit
  8. 09 Mar, 2023 2 commits
  9. 22 Feb, 2023 2 commits
  10. 20 Feb, 2023 1 commit
    • AlexWertheim's avatar
      Enable PyTorch/XLA Fully Sharded Data Parallel (FSDP) (#21406) · 7735e040
      AlexWertheim authored
      
      
      * Reinserted import statement accidentally removed during rebasing.
      
      * Added auto_wrap functionality, restructured XLA FSDP logic to more closely match PyTorch FSDP logic.
      
      * Fixed flag descriptions; changed several instances of fsdp_ to xla_fsdp_; pass in auto_wrap_policy and auto_wrapper_callable directly to avoid lambda saving.
      
      * Moved XLA FSDP logic to be adjacent to Fairscale FSDP logic in trainer.
      
      * Formatted changes in accordance with HF style requirements.
      
      * Added back in warning which was accidentally removed.
      
      * - Merged XLA FSDP training arguments into `fsdp_config`
      - Added `xla` boolean flag to `fsdp_config` to specify XLA FSDP wrapping
      - Merged XLA FSDP wrapping logic into FSDP wrapping logic within trainer
        class
      
      * Cleaned up errors, moved argument to fsdp_config
      
      - Set `xla` and `xla_fsdp_grad_ckpt` flags by default in fsdp_config
      - Added missing colons following conditionals
      - Moved `fsdp_transformer_layer_cls_to_wrap` to `fsdp_config`
      - Modified `fsdp_transformer_layer_cls_to_wrap` to be list of strings,
        not just one string
      - Changed Fairscale FSDP logic to allow for set of layer classes to wrap
      - Removed unnecessary checks for `xla_fsdp`
      
      * Corrected small errors, improved layer class flag
      
      - Correctly set default values for `xla` and `xla_fsdp_grad_ckpt`
        arguments
      - Made `fsdp_transformer_layer_cls_to_wrap` a list of strings instead of
        a single string
      - Added processing to ensure that `fsdp_transformer_layer_cls_to_wrap`
        works as expected if passed as a single string
      - Updated PyTorch FSDP logic to accept a list of layers to wrap, as done
        with XLA FSDP
      - Replaced instances of `getattr()` with `.get()` for dictionary
        retrievals with default values, including when setting
        `fsdp_min_num_params`
      - Corrected `self.fsdp is not None` to `len(self.fsdp) > 0`
      - Removed extraneous `xla_fsdp` argument descriptions from outside
        `fsdp_config`
      
      * Changed xla-fsdp-settings to be dictionary
      
      - Modified xla-fsdp-settings to be entered directly as dictionary
        instead of loaded through JSON file
      - Made small style corrections
      
      * Reverted unintentional local_rank TPU check
      
      * Do not block XLA FSDP if local rank is -1
      
      * Rebased and applied automatic formatting
      
      - Rebased
      - Applied automatic formatting changes via `make style`
      
      * Applied automatic formatting with latest version of black
      
      * Replaced  expression with
      
      * Reran black examples tests src utils
      ruff examples tests src utils --fix
      make autogenerate_code
      make[1]: Entering directory '/usr/local/google/home/awertheim/HF-FSDP-PR/transformers'
      make[1]: Leaving directory '/usr/local/google/home/awertheim/HF-FSDP-PR/transformers' after additional formatting changes
      
      * Additionall automatic formatting changes
      
      * Remove unnecessary whitespace characters from src/transformers/training_args.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      7735e040
  11. 07 Feb, 2023 1 commit
  12. 06 Feb, 2023 1 commit
    • Sylvain Gugger's avatar
      Update quality tooling for formatting (#21480) · 6f79d264
      Sylvain Gugger authored
      * Result of black 23.1
      
      * Update target to Python 3.7
      
      * Switch flake8 to ruff
      
      * Configure isort
      
      * Configure isort
      
      * Apply isort with line limit
      
      * Put the right black version
      
      * adapt black in check copies
      
      * Fix copies
      6f79d264
  13. 31 Jan, 2023 1 commit
  14. 24 Jan, 2023 1 commit
  15. 18 Jan, 2023 1 commit
    • jeffhataws's avatar
      Add AWS Neuron torchrun support (#20806) · c59d71b2
      jeffhataws authored
      * Add XLA torchrun support
      
      * Clarify that currently DDP doesn't work with torch.distributed XLA backend yet
      
      * Enable DDP with torchrun and XLA (now available in PT-XLA 1.13)
      
      * Add check for AWS Neuron availability and AWS Neuron specific compiler flag
      
      * Change the new test's name to TestTrainerDistributedNeuronCore
      
      * Remove "assert" and replace raised exception
      
      * Remove compiler flag as it is optional. If needed, will be another PR.
      
      * Use TORCHELASTIC_RUN_ID to determine whether torchrun is used
      c59d71b2
  16. 29 Dec, 2022 1 commit
  17. 14 Dec, 2022 1 commit
  18. 08 Dec, 2022 2 commits
  19. 30 Nov, 2022 2 commits
  20. 28 Nov, 2022 2 commits
  21. 18 Nov, 2022 1 commit
    • atturaioe's avatar
      Add AnyPrecisionAdamW optimizer (#18961) · 84c9cc6d
      atturaioe authored
      * Add AnyPrecisionAdamW optimizer
      
      * Add optim_args argument to TrainingArgs
      
      * Add tests for AnyPrecisionOptimizer
      
      * Change AnyPrecisionAdam default params to float32
      
      * Move default_anyprecision_kwargs in trainer test
      
      * Rename AnyPrecisionAdamW
      84c9cc6d
  22. 15 Nov, 2022 1 commit
  23. 14 Oct, 2022 1 commit
  24. 29 Sep, 2022 1 commit
  25. 22 Sep, 2022 1 commit
  26. 21 Sep, 2022 1 commit
  27. 09 Sep, 2022 1 commit
  28. 07 Sep, 2022 1 commit
  29. 01 Sep, 2022 1 commit
    • Gustavo de Rosa's avatar
      Adds timeout argument to training_args to avoid socket timeouts in DDP (#18562) · fe58929a
      Gustavo de Rosa authored
      * chore(training_args): Adds support for timeout argument.
      
      * fix(training_args): Passes make style through changes.
      
      * fix(training_args): Removes wrong docstring sentence.
      
      * fix(training_args): Fixes timeout not being JSON serializable.
      
      * fix(training_args_sm): Also updates timeout to timeout_delta.
      
      * fix(training_args): Fixes PR according to suggestions.
      fe58929a
  30. 31 Aug, 2022 1 commit
  31. 16 Aug, 2022 1 commit
  32. 10 Aug, 2022 1 commit
    • Matt's avatar
      TF Examples Rewrite (#18451) · 6eb51450
      Matt authored
      
      
      * Finished QA example
      
      * Dodge a merge conflict
      
      * Update text classification and LM examples
      
      * Update NER example
      
      * New Keras metrics WIP, fix NER example
      
      * Update NER example
      
      * Update MC, summarization and translation examples
      
      * Add XLA warnings when shapes are variable
      
      * Make sure batch_size is consistently scaled by num_replicas
      
      * Add PushToHubCallback to all models
      
      * Add docs links for KerasMetricCallback
      
      * Add docs links for prepare_tf_dataset and jit_compile
      
      * Correct inferred model names
      
      * Don't assume the dataset has 'lang'
      
      * Don't assume the dataset has 'lang'
      
      * Write metrics in text classification
      
      * Add 'framework' to TrainingArguments and TFTrainingArguments
      
      * Export metrics in all examples and add tests
      
      * Fix training args for Flax
      
      * Update command line args for translation test
      
      * make fixup
      
      * Fix accidentally running other tests in fp16
      
      * Remove do_train/do_eval from run_clm.py
      
      * Remove do_train/do_eval from run_mlm.py
      
      * Add tensorflow tests to circleci
      
      * Fix circleci
      
      * Update examples/tensorflow/language-modeling/run_mlm.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update examples/tensorflow/test_tensorflow_examples.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update examples/tensorflow/translation/run_translation.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update examples/tensorflow/token-classification/run_ner.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Fix save path for tests
      
      * Fix some model card kwargs
      
      * Explain the magical -1000
      
      * Actually enable tests this time
      
      * Skip text classification PR until we fix shape inference
      
      * make fixup
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      6eb51450
  33. 27 Jul, 2022 1 commit
  34. 26 Jul, 2022 1 commit