Commits · 9884862383766b9de666fb677e66b841e9dde63b · chenpangpang / transformers

"docs/source/ko/tasks/monocular_depth_estimation.md" did not exist on "1fe1e3caa44617047f149bcc0c0b566343b714a7"

01 May, 2023 1 commit

Depricate xpu_backend for ddp_backend (#23085) · 98848623

Zachary Mueller authored May 01, 2023



* Depricate xpu_backend for ddp_backend

* Typo

* Only do a minor deprecation, no need for major
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

98848623

28 Apr, 2023 1 commit

Add Trainer support for ReduceLROnPlateau (#23010) · 9b435204

Maxime Méloux authored Apr 28, 2023



* Add Trainer support for ReduceLROnPlateau

Fixes #16503

* Remove training argument and add default instance

---------
Co-authored-by: mmeloux <maxime.meloux@loria.fr>

9b435204

26 Apr, 2023 1 commit

Bring back PartialState DeepSpeed (#22921) · 8b129030

Zachary Mueller authored Apr 26, 2023

* Bring back deepspeed integration

* Branchname

* Self-scheduled

* newline

* Use deepspeed env var

* Remove comment

* Del env var after partialstate

8b129030

20 Apr, 2023 1 commit
- Revert DeepSpeed stuff from accelerate integration (#22899) · 5764e67c
  Zachary Mueller authored Apr 20, 2023
  
  5764e67c
19 Apr, 2023 1 commit
- Fixup multigpu local_rank (#22869) · a8aad0ec
  Zachary Mueller authored Apr 19, 2023
```
Fixup multigpu tests
```
  a8aad0ec
18 Apr, 2023 2 commits

Raise err if minimum Accelerate version isn't available (#22841) · 5bb4ec62

Zachary Mueller authored Apr 18, 2023



* Add warning about accelerate

* Version block Accelerate

* Include parse

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Check partial state

* Update param

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

5bb4ec62

Update accelerate version + warning check fix (#22833) · aec10d16
Zachary Mueller authored Apr 18, 2023

aec10d16

17 Apr, 2023 1 commit

Introduce `PartialState` as the device handler in the `Trainer` (#22752) · 03462875

Zachary Mueller authored Apr 17, 2023



* Use accelerate for device management

* Add accelerate to setup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

03462875

13 Apr, 2023 1 commit
- [trainer] update url (#22747) · d85bf954
  Stas Bekman authored Apr 13, 2023
```
* [trainer] update url

* style
```
  d85bf954
12 Apr, 2023 1 commit

`torch.distributed` group initialization for `torch_neuron` disabled when... · 10fab90f

Michael Benayoun authored Apr 12, 2023

`torch.distributed` group initialization for `torch_neuron` disabled when `optimum-neuron` is installed (#22728)

* Make the process group initialization not happen if optimum_neuron is installed

* Add warning

* Remove list and added warning

10fab90f

04 Apr, 2023 1 commit

Implemented safetensors checkpoints save/load for Trainer (#22498) · 871598be

Viktor Scherbakov authored Apr 04, 2023



* implemented safetensors save/load

* remove duplicated file

* added tests

* more tests

* style fix

* fix tf tests

* change to list comprehension
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* review fixes + safe load for sharded checkpoint

* style fix

* remove rogue import

* remove partial to avoid undefined exception

* use naming alias instead of safetensors.torch

* fix safe sharding in tests

* grammar
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor corrections

* style

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

871598be

24 Mar, 2023 1 commit
- [Trainer] add disclaimer that full_determinism is slow (#22368) · 500fce07
  Stas Bekman authored Mar 24, 2023
  
  500fce07
20 Mar, 2023 2 commits
- [Trainer] Add optional communication backends for torch.distributed when using GPU (#22247) · cf0af9a3
  heya5 authored Mar 20, 2023
```
Update training_args.py
```
  cf0af9a3
- Update training_args.py -- a nightly install is not required anymore for torch.compile (#22266) · a48310de
  Pasquale Minervini authored Mar 20, 2023
```
Update training_args.py

A nightly install is not required anymore for `torch.compile`.
```
  a48310de
14 Mar, 2023 1 commit

[trainer] add `--optim adamw_torch_fused` for pt-2.0+ (#22144) · 085bf5c1

Stas Bekman authored Mar 14, 2023

* [trainer] add --optim adamw_torch_fused

* change optim default

* deal with non-torch

* revert default change; prep; add fp16/amp assert

* typo

* typo

085bf5c1

13 Mar, 2023 1 commit

Remove backend check for torch.compile (#22140) · 3a35937e

Sylvain Gugger authored Mar 13, 2023



* Remove backend enforcment for torch.compile

* Update error

* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Style

---------
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

3a35937e

09 Mar, 2023 2 commits
- Fix case when using --gradient_accumulation_steps with DDP disabled. (#22007) · 1a5fc300
  aws-sangeetha authored Mar 09, 2023
```
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-42-72.us-west-2.compute.internal>
```
  1a5fc300
- Add setters by type of args to TrainingArguments (#21570) · 7a2b915e
  Sylvain Gugger authored Mar 09, 2023
```
* Add setters by type of args to TrainingArguments

* Define more setters
```
  7a2b915e
22 Feb, 2023 2 commits
- Respect documentation on passive log level (#21700) · b19d64d8
  Sylvain Gugger authored Feb 22, 2023
```
* Respect documentation on passive log level

* Fix test and set log level in examples

* Add doc
```
  b19d64d8
- Apply ruff flake8-comprehensions (#21694) · 5e8c8eb5
  Aaron Gokaslan authored Feb 22, 2023
  
  5e8c8eb5
20 Feb, 2023 1 commit

Enable PyTorch/XLA Fully Sharded Data Parallel (FSDP) (#21406) · 7735e040

AlexWertheim authored Feb 20, 2023



* Reinserted import statement accidentally removed during rebasing.

* Added auto_wrap functionality, restructured XLA FSDP logic to more closely match PyTorch FSDP logic.

* Fixed flag descriptions; changed several instances of fsdp_ to xla_fsdp_; pass in auto_wrap_policy and auto_wrapper_callable directly to avoid lambda saving.

* Moved XLA FSDP logic to be adjacent to Fairscale FSDP logic in trainer.

* Formatted changes in accordance with HF style requirements.

* Added back in warning which was accidentally removed.

* - Merged XLA FSDP training arguments into `fsdp_config`
- Added `xla` boolean flag to `fsdp_config` to specify XLA FSDP wrapping
- Merged XLA FSDP wrapping logic into FSDP wrapping logic within trainer
  class

* Cleaned up errors, moved argument to fsdp_config

- Set `xla` and `xla_fsdp_grad_ckpt` flags by default in fsdp_config
- Added missing colons following conditionals
- Moved `fsdp_transformer_layer_cls_to_wrap` to `fsdp_config`
- Modified `fsdp_transformer_layer_cls_to_wrap` to be list of strings,
  not just one string
- Changed Fairscale FSDP logic to allow for set of layer classes to wrap
- Removed unnecessary checks for `xla_fsdp`

* Corrected small errors, improved layer class flag

- Correctly set default values for `xla` and `xla_fsdp_grad_ckpt`
  arguments
- Made `fsdp_transformer_layer_cls_to_wrap` a list of strings instead of
  a single string
- Added processing to ensure that `fsdp_transformer_layer_cls_to_wrap`
  works as expected if passed as a single string
- Updated PyTorch FSDP logic to accept a list of layers to wrap, as done
  with XLA FSDP
- Replaced instances of `getattr()` with `.get()` for dictionary
  retrievals with default values, including when setting
  `fsdp_min_num_params`
- Corrected `self.fsdp is not None` to `len(self.fsdp) > 0`
- Removed extraneous `xla_fsdp` argument descriptions from outside
  `fsdp_config`

* Changed xla-fsdp-settings to be dictionary

- Modified xla-fsdp-settings to be entered directly as dictionary
  instead of loaded through JSON file
- Made small style corrections

* Reverted unintentional local_rank TPU check

* Do not block XLA FSDP if local rank is -1

* Rebased and applied automatic formatting

- Rebased
- Applied automatic formatting changes via `make style`

* Applied automatic formatting with latest version of black

* Replaced  expression with

* Reran black examples tests src utils
ruff examples tests src utils --fix
make autogenerate_code
make[1]: Entering directory '/usr/local/google/home/awertheim/HF-FSDP-PR/transformers'
make[1]: Leaving directory '/usr/local/google/home/awertheim/HF-FSDP-PR/transformers' after additional formatting changes

* Additionall automatic formatting changes

* Remove unnecessary whitespace characters from src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

7735e040

07 Feb, 2023 1 commit

Add limit_all_gathers option to fsdp_config and fix forward_prefetch bug (#21489) · 571fa585

raghavanone authored Feb 07, 2023

* Add limit_all_gathers option to fsdp_config and fix forward_prefetch bug

* Fix black issue

* Fix ruff failure

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

571fa585

06 Feb, 2023 1 commit

Update quality tooling for formatting (#21480) · 6f79d264

Sylvain Gugger authored Feb 06, 2023

* Result of black 23.1

* Update target to Python 3.7

* Switch flake8 to ruff

* Configure isort

* Configure isort

* Apply isort with line limit

* Put the right black version

* adapt black in check copies

* Fix copies

6f79d264

31 Jan, 2023 1 commit

Add support of backward_prefetch and forward_prefetch (#21237) · da2a4d95

raghavanone authored Jan 31, 2023



* Add support of backward_prefetch and forward_prefetch

* Fix format issue

* Fix isort issue

* Fix doc style issue

* Update src/transformers/trainer.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* Fix black issue

* Fix doc-style issue

* Make additional fsdp parameters into fsdp config

* Fix black issue

* Remove unused imports

* Fix doc style issues

* Incorporate PR feedbacks

* Remove unused imports

* Fix tests

* Fix tests

* Fix tests

* Fix tests

* Fix tests

* Update src/transformers/training_args.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* Fix tests

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Fix black issues

---------
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

da2a4d95

24 Jan, 2023 1 commit
- Fix `TrainingArguments.label_names` docs to reflect the correct default value behaviour (#21288) · 1f981215
  Frederico Tommasi Caroli authored Jan 24, 2023
```
* Update TrainingArguments.label_names docs

* Change wording

* Change wording
```
  1f981215
18 Jan, 2023 1 commit

Add AWS Neuron torchrun support (#20806) · c59d71b2

jeffhataws authored Jan 18, 2023

* Add XLA torchrun support

* Clarify that currently DDP doesn't work with torch.distributed XLA backend yet

* Enable DDP with torchrun and XLA (now available in PT-XLA 1.13)

* Add check for AWS Neuron availability and AWS Neuron specific compiler flag

* Change the new test's name to TestTrainerDistributedNeuronCore

* Remove "assert" and replace raised exception

* Remove compiler flag as it is optional. If needed, will be another PR.

* Use TORCHELASTIC_RUN_ID to determine whether torchrun is used

c59d71b2

29 Dec, 2022 1 commit

Remove non-breaking spaces (#20929) · 0b686a8a

Alex Hedges authored Dec 29, 2022

* Remove non-breaking space in comment

It was likely added unintionally.

* Remove remaining non-breaking spaces

0b686a8a

14 Dec, 2022 1 commit
- Replaces xxx_required with requires_backends (#20715) · 7b23a582
  amyeroberts authored Dec 14, 2022
```
* Replaces xxx_required with requires_backends

* Fixup
```
  7b23a582
08 Dec, 2022 2 commits
- Enable bf16 option for XLA devices (#20684) · bcc069dd
  jeffhataws authored Dec 08, 2022
  
  bcc069dd
- Migrate torchdynamo to torch.compile (#20634) · 9cc65f87
  Sylvain Gugger authored Dec 08, 2022
```
* Migrate torchdynamo to torch.compile

* Add docstring and generic option

* Properly use the function...

* Reorg args
```
  9cc65f87
30 Nov, 2022 2 commits
- Add some warning for Dynamo and enable TF32 when it's set (#20515) · e342ac7e
  Sylvain Gugger authored Nov 30, 2022
  
  e342ac7e
- Repurpose torchdynamo training args towards torch._dynamo (#20498) · 08b46218
  Sylvain Gugger authored Nov 30, 2022
```
* Repurpose torchdynamo training args towards torch._dynamo

* Add doc
```
  08b46218
28 Nov, 2022 2 commits
- add timeout option for deepspeed engine (#20443) · 955780d3
  Henghui Zhu authored Nov 28, 2022
  
  955780d3
- with pytorch cpu only version. without --no_cuda, using --bf16 will trigger... · de53e4bf
  Wang, Yi authored Nov 28, 2022
```
with pytorch cpu only version. without --no_cuda, using --bf16 will trigger error like "Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0" (#20445)
```
  de53e4bf
18 Nov, 2022 1 commit

Add AnyPrecisionAdamW optimizer (#18961) · 84c9cc6d

atturaioe authored Nov 18, 2022

* Add AnyPrecisionAdamW optimizer

* Add optim_args argument to TrainingArgs

* Add tests for AnyPrecisionOptimizer

* Change AnyPrecisionAdam default params to float32

* Move default_anyprecision_kwargs in trainer test

* Rename AnyPrecisionAdamW

84c9cc6d

15 Nov, 2022 1 commit

New logging support to "Trainer" Class (ClearML Logger) (#20184) · 777b1bfe

Muhammad Sakib Khan Inan authored Nov 15, 2022



* Init Update

* ClearML Callbacks integration

* update corrections

* args reporting updated

* {'tensorboard': False, 'pytorch': False}

* ClearML Tests added

* add clearml

* output_uri=True in Task.init

* reformatted integrations.py

* reformatted and fixed

* IF-ELSE statement issue on "has_clearml" resolved

* Add clearml in main callback docs

* Add additional clearml documentation

* Update src/transformers/integrations.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Accept suggestion
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Accept suggestion
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Small change in comments

* Make style clearml

* Accept suggestion
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Victor Sonck <victor.sonck@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

777b1bfe

14 Oct, 2022 1 commit

add gloo backend support for CPU DDP (#19555) · e82c1cb7

Wang, Yi authored Oct 14, 2022


Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

e82c1cb7

29 Sep, 2022 1 commit
- Fix TrainingArgs argument serialization (#19239) · b79028f0
  atturaioe authored Sep 29, 2022
  
  b79028f0
22 Sep, 2022 1 commit
- Fix TrainingArguments documentation (#19162) · 8d59385f
  Sylvain Gugger authored Sep 22, 2022
```
* Fix TrainingArguments documentation

* Fix TFTrainingArguments documentation
```
  8d59385f
21 Sep, 2022 1 commit
- [BugFix] Fix fsdp option on shard_grad_op. (#19131) · da6a1b6c
  Zhong Hui authored Sep 21, 2022
  
  da6a1b6c