Commits · d85bf954361f39f2ea38386940f40d29ed201910 · chenpangpang / transformers

13 Apr, 2023 1 commit
- [trainer] update url (#22747) · d85bf954
  Stas Bekman authored Apr 13, 2023
```
* [trainer] update url

* style
```
  d85bf954
12 Apr, 2023 1 commit

`torch.distributed` group initialization for `torch_neuron` disabled when... · 10fab90f

Michael Benayoun authored Apr 12, 2023

`torch.distributed` group initialization for `torch_neuron` disabled when `optimum-neuron` is installed (#22728)

* Make the process group initialization not happen if optimum_neuron is installed

* Add warning

* Remove list and added warning

10fab90f

04 Apr, 2023 1 commit

Implemented safetensors checkpoints save/load for Trainer (#22498) · 871598be

Viktor Scherbakov authored Apr 04, 2023



* implemented safetensors save/load

* remove duplicated file

* added tests

* more tests

* style fix

* fix tf tests

* change to list comprehension
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* review fixes + safe load for sharded checkpoint

* style fix

* remove rogue import

* remove partial to avoid undefined exception

* use naming alias instead of safetensors.torch

* fix safe sharding in tests

* grammar
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor corrections

* style

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

871598be

24 Mar, 2023 1 commit
- [Trainer] add disclaimer that full_determinism is slow (#22368) · 500fce07
  Stas Bekman authored Mar 24, 2023
  
  500fce07
20 Mar, 2023 2 commits
- [Trainer] Add optional communication backends for torch.distributed when using GPU (#22247) · cf0af9a3
  heya5 authored Mar 20, 2023
```
Update training_args.py
```
  cf0af9a3
- Update training_args.py -- a nightly install is not required anymore for torch.compile (#22266) · a48310de
  Pasquale Minervini authored Mar 20, 2023
```
Update training_args.py

A nightly install is not required anymore for `torch.compile`.
```
  a48310de
14 Mar, 2023 1 commit

[trainer] add `--optim adamw_torch_fused` for pt-2.0+ (#22144) · 085bf5c1

Stas Bekman authored Mar 14, 2023

* [trainer] add --optim adamw_torch_fused

* change optim default

* deal with non-torch

* revert default change; prep; add fp16/amp assert

* typo

* typo

085bf5c1

13 Mar, 2023 1 commit

Remove backend check for torch.compile (#22140) · 3a35937e

Sylvain Gugger authored Mar 13, 2023



* Remove backend enforcment for torch.compile

* Update error

* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Style

---------
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

3a35937e

09 Mar, 2023 2 commits
- Fix case when using --gradient_accumulation_steps with DDP disabled. (#22007) · 1a5fc300
  aws-sangeetha authored Mar 09, 2023
```
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-42-72.us-west-2.compute.internal>
```
  1a5fc300
- Add setters by type of args to TrainingArguments (#21570) · 7a2b915e
  Sylvain Gugger authored Mar 09, 2023
```
* Add setters by type of args to TrainingArguments

* Define more setters
```
  7a2b915e
22 Feb, 2023 2 commits
- Respect documentation on passive log level (#21700) · b19d64d8
  Sylvain Gugger authored Feb 22, 2023
```
* Respect documentation on passive log level

* Fix test and set log level in examples

* Add doc
```
  b19d64d8
- Apply ruff flake8-comprehensions (#21694) · 5e8c8eb5
  Aaron Gokaslan authored Feb 22, 2023
  
  5e8c8eb5
20 Feb, 2023 1 commit

Enable PyTorch/XLA Fully Sharded Data Parallel (FSDP) (#21406) · 7735e040

AlexWertheim authored Feb 20, 2023



* Reinserted import statement accidentally removed during rebasing.

* Added auto_wrap functionality, restructured XLA FSDP logic to more closely match PyTorch FSDP logic.

* Fixed flag descriptions; changed several instances of fsdp_ to xla_fsdp_; pass in auto_wrap_policy and auto_wrapper_callable directly to avoid lambda saving.

* Moved XLA FSDP logic to be adjacent to Fairscale FSDP logic in trainer.

* Formatted changes in accordance with HF style requirements.

* Added back in warning which was accidentally removed.

* - Merged XLA FSDP training arguments into `fsdp_config`
- Added `xla` boolean flag to `fsdp_config` to specify XLA FSDP wrapping
- Merged XLA FSDP wrapping logic into FSDP wrapping logic within trainer
  class

* Cleaned up errors, moved argument to fsdp_config

- Set `xla` and `xla_fsdp_grad_ckpt` flags by default in fsdp_config
- Added missing colons following conditionals
- Moved `fsdp_transformer_layer_cls_to_wrap` to `fsdp_config`
- Modified `fsdp_transformer_layer_cls_to_wrap` to be list of strings,
  not just one string
- Changed Fairscale FSDP logic to allow for set of layer classes to wrap
- Removed unnecessary checks for `xla_fsdp`

* Corrected small errors, improved layer class flag

- Correctly set default values for `xla` and `xla_fsdp_grad_ckpt`
  arguments
- Made `fsdp_transformer_layer_cls_to_wrap` a list of strings instead of
  a single string
- Added processing to ensure that `fsdp_transformer_layer_cls_to_wrap`
  works as expected if passed as a single string
- Updated PyTorch FSDP logic to accept a list of layers to wrap, as done
  with XLA FSDP
- Replaced instances of `getattr()` with `.get()` for dictionary
  retrievals with default values, including when setting
  `fsdp_min_num_params`
- Corrected `self.fsdp is not None` to `len(self.fsdp) > 0`
- Removed extraneous `xla_fsdp` argument descriptions from outside
  `fsdp_config`

* Changed xla-fsdp-settings to be dictionary

- Modified xla-fsdp-settings to be entered directly as dictionary
  instead of loaded through JSON file
- Made small style corrections

* Reverted unintentional local_rank TPU check

* Do not block XLA FSDP if local rank is -1

* Rebased and applied automatic formatting

- Rebased
- Applied automatic formatting changes via `make style`

* Applied automatic formatting with latest version of black

* Replaced  expression with

* Reran black examples tests src utils
ruff examples tests src utils --fix
make autogenerate_code
make[1]: Entering directory '/usr/local/google/home/awertheim/HF-FSDP-PR/transformers'
make[1]: Leaving directory '/usr/local/google/home/awertheim/HF-FSDP-PR/transformers' after additional formatting changes

* Additionall automatic formatting changes

* Remove unnecessary whitespace characters from src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

7735e040

07 Feb, 2023 1 commit

Add limit_all_gathers option to fsdp_config and fix forward_prefetch bug (#21489) · 571fa585

raghavanone authored Feb 07, 2023

* Add limit_all_gathers option to fsdp_config and fix forward_prefetch bug

* Fix black issue

* Fix ruff failure

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

571fa585

06 Feb, 2023 1 commit

Update quality tooling for formatting (#21480) · 6f79d264

Sylvain Gugger authored Feb 06, 2023

* Result of black 23.1

* Update target to Python 3.7

* Switch flake8 to ruff

* Configure isort

* Configure isort

* Apply isort with line limit

* Put the right black version

* adapt black in check copies

* Fix copies

6f79d264

31 Jan, 2023 1 commit

Add support of backward_prefetch and forward_prefetch (#21237) · da2a4d95

raghavanone authored Jan 31, 2023



* Add support of backward_prefetch and forward_prefetch

* Fix format issue

* Fix isort issue

* Fix doc style issue

* Update src/transformers/trainer.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* Fix black issue

* Fix doc-style issue

* Make additional fsdp parameters into fsdp config

* Fix black issue

* Remove unused imports

* Fix doc style issues

* Incorporate PR feedbacks

* Remove unused imports

* Fix tests

* Fix tests

* Fix tests

* Fix tests

* Fix tests

* Update src/transformers/training_args.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* Fix tests

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Fix black issues

---------
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

da2a4d95

24 Jan, 2023 1 commit
- Fix `TrainingArguments.label_names` docs to reflect the correct default value behaviour (#21288) · 1f981215
  Frederico Tommasi Caroli authored Jan 24, 2023
```
* Update TrainingArguments.label_names docs

* Change wording

* Change wording
```
  1f981215
18 Jan, 2023 1 commit

Add AWS Neuron torchrun support (#20806) · c59d71b2

jeffhataws authored Jan 18, 2023

* Add XLA torchrun support

* Clarify that currently DDP doesn't work with torch.distributed XLA backend yet

* Enable DDP with torchrun and XLA (now available in PT-XLA 1.13)

* Add check for AWS Neuron availability and AWS Neuron specific compiler flag

* Change the new test's name to TestTrainerDistributedNeuronCore

* Remove "assert" and replace raised exception

* Remove compiler flag as it is optional. If needed, will be another PR.

* Use TORCHELASTIC_RUN_ID to determine whether torchrun is used

c59d71b2

29 Dec, 2022 1 commit

Remove non-breaking spaces (#20929) · 0b686a8a

Alex Hedges authored Dec 29, 2022

* Remove non-breaking space in comment

It was likely added unintionally.

* Remove remaining non-breaking spaces

0b686a8a

14 Dec, 2022 1 commit
- Replaces xxx_required with requires_backends (#20715) · 7b23a582
  amyeroberts authored Dec 14, 2022
```
* Replaces xxx_required with requires_backends

* Fixup
```
  7b23a582
08 Dec, 2022 2 commits
- Enable bf16 option for XLA devices (#20684) · bcc069dd
  jeffhataws authored Dec 08, 2022
  
  bcc069dd
- Migrate torchdynamo to torch.compile (#20634) · 9cc65f87
  Sylvain Gugger authored Dec 08, 2022
```
* Migrate torchdynamo to torch.compile

* Add docstring and generic option

* Properly use the function...

* Reorg args
```
  9cc65f87
30 Nov, 2022 2 commits
- Add some warning for Dynamo and enable TF32 when it's set (#20515) · e342ac7e
  Sylvain Gugger authored Nov 30, 2022
  
  e342ac7e
- Repurpose torchdynamo training args towards torch._dynamo (#20498) · 08b46218
  Sylvain Gugger authored Nov 30, 2022
```
* Repurpose torchdynamo training args towards torch._dynamo

* Add doc
```
  08b46218
28 Nov, 2022 2 commits
- add timeout option for deepspeed engine (#20443) · 955780d3
  Henghui Zhu authored Nov 28, 2022
  
  955780d3
- with pytorch cpu only version. without --no_cuda, using --bf16 will trigger... · de53e4bf
  Wang, Yi authored Nov 28, 2022
```
with pytorch cpu only version. without --no_cuda, using --bf16 will trigger error like "Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0" (#20445)
```
  de53e4bf
18 Nov, 2022 1 commit

Add AnyPrecisionAdamW optimizer (#18961) · 84c9cc6d

atturaioe authored Nov 18, 2022

* Add AnyPrecisionAdamW optimizer

* Add optim_args argument to TrainingArgs

* Add tests for AnyPrecisionOptimizer

* Change AnyPrecisionAdam default params to float32

* Move default_anyprecision_kwargs in trainer test

* Rename AnyPrecisionAdamW

84c9cc6d

15 Nov, 2022 1 commit

New logging support to "Trainer" Class (ClearML Logger) (#20184) · 777b1bfe

Muhammad Sakib Khan Inan authored Nov 15, 2022



* Init Update

* ClearML Callbacks integration

* update corrections

* args reporting updated

* {'tensorboard': False, 'pytorch': False}

* ClearML Tests added

* add clearml

* output_uri=True in Task.init

* reformatted integrations.py

* reformatted and fixed

* IF-ELSE statement issue on "has_clearml" resolved

* Add clearml in main callback docs

* Add additional clearml documentation

* Update src/transformers/integrations.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Accept suggestion
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Accept suggestion
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Small change in comments

* Make style clearml

* Accept suggestion
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Victor Sonck <victor.sonck@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

777b1bfe

14 Oct, 2022 1 commit

add gloo backend support for CPU DDP (#19555) · e82c1cb7

Wang, Yi authored Oct 14, 2022


Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

e82c1cb7

29 Sep, 2022 1 commit
- Fix TrainingArgs argument serialization (#19239) · b79028f0
  atturaioe authored Sep 29, 2022
  
  b79028f0
22 Sep, 2022 1 commit
- Fix TrainingArguments documentation (#19162) · 8d59385f
  Sylvain Gugger authored Sep 22, 2022
```
* Fix TrainingArguments documentation

* Fix TFTrainingArguments documentation
```
  8d59385f
21 Sep, 2022 1 commit
- [BugFix] Fix fsdp option on shard_grad_op. (#19131) · da6a1b6c
  Zhong Hui authored Sep 21, 2022
  
  da6a1b6c
09 Sep, 2022 1 commit

Neptune.ai integration improvements (#18934) · 85125fcf

Rafał Jankowski authored Sep 09, 2022



* NeptuneCallback improvements

* After review suggestions and deduplication of initial run

* Added volatile checkpoints support due to missing post-rebase commit

* Update README per review comments

- Remove list formatting
- Correct Neptune docs link
Co-authored-by: Sabine <sabine.nyholm@neptune.ai>

85125fcf

07 Sep, 2022 1 commit

Fix XLA fp16 and bf16 error checking (#18913) · 63942218

Yanming Wang authored Sep 07, 2022



* Fix XLA fp16 and bf16 error checking

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

63942218

01 Sep, 2022 1 commit

Adds timeout argument to training_args to avoid socket timeouts in DDP (#18562) · fe58929a

Gustavo de Rosa authored Sep 01, 2022

* chore(training_args): Adds support for timeout argument.

* fix(training_args): Passes make style through changes.

* fix(training_args): Removes wrong docstring sentence.

* fix(training_args): Fixes timeout not being JSON serializable.

* fix(training_args_sm): Also updates timeout to timeout_delta.

* fix(training_args): Fixes PR according to suggestions.

fe58929a

31 Aug, 2022 1 commit

oob performance improvement for cpu DDP (#18595) · cdde85a0

Wang, Yi authored Aug 31, 2022



* oob performance improvement for cpu DDP
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add is_psutil_available check
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

cdde85a0

16 Aug, 2022 1 commit

mac m1 `mps` integration (#18598) · 9cf27468

Sourab Mangrulkar authored Aug 16, 2022



* mac m1 `mps` integration

* Update docs/source/en/main_classes/trainer.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* addressing comments

* Apply suggestions from code review
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>

* resolve comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>

9cf27468

10 Aug, 2022 1 commit

TF Examples Rewrite (#18451) · 6eb51450

Matt authored Aug 10, 2022



* Finished QA example

* Dodge a merge conflict

* Update text classification and LM examples

* Update NER example

* New Keras metrics WIP, fix NER example

* Update NER example

* Update MC, summarization and translation examples

* Add XLA warnings when shapes are variable

* Make sure batch_size is consistently scaled by num_replicas

* Add PushToHubCallback to all models

* Add docs links for KerasMetricCallback

* Add docs links for prepare_tf_dataset and jit_compile

* Correct inferred model names

* Don't assume the dataset has 'lang'

* Don't assume the dataset has 'lang'

* Write metrics in text classification

* Add 'framework' to TrainingArguments and TFTrainingArguments

* Export metrics in all examples and add tests

* Fix training args for Flax

* Update command line args for translation test

* make fixup

* Fix accidentally running other tests in fp16

* Remove do_train/do_eval from run_clm.py

* Remove do_train/do_eval from run_mlm.py

* Add tensorflow tests to circleci

* Fix circleci

* Update examples/tensorflow/language-modeling/run_mlm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/test_tensorflow_examples.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/translation/run_translation.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/token-classification/run_ner.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix save path for tests

* Fix some model card kwargs

* Explain the magical -1000

* Actually enable tests this time

* Skip text classification PR until we fix shape inference

* make fixup
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

6eb51450

27 Jul, 2022 1 commit

start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch … (#18229) · 2b81f72b

Wang, Yi authored Jul 27, 2022



* start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch and should import it before use
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add doc for perf_train_cpu_many
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

2b81f72b

26 Jul, 2022 1 commit
- patch for smddp import (#18244) · ee67e7ad
  Carolyn Wang authored Jul 26, 2022
```
* add import

* format
```
  ee67e7ad