Commits · 37bba2a32d2742a10216ffd925bb8f145a732ce1 · chenpangpang / transformers

13 May, 2024 1 commit

CI: update to ROCm 6.0.2 and test MI300 (#30266) · 37bba2a3

fxmarty authored May 13, 2024



* update to ROCm 6.0.2 and test MI300

* add callers for mi300

* update dockerfile

* fix trainer tests

* remove apex

* style

* Update tests/trainer/test_trainer_seq2seq.py

* Update tests/trainer/test_trainer_seq2seq.py

* Update tests/trainer/test_trainer_seq2seq.py

* Update tests/trainer/test_trainer_seq2seq.py

* update to torch 2.3

* add workflow dispatch target

* we may need branches: mi300-ci after all

* nit

* fix docker build

* nit

* add check runner

* remove docker-gpu

* fix issues

* fix

---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

37bba2a3

04 Mar, 2024 1 commit
- 🚨 Fully revert atomic checkpointing 🚨 (#29370) · 1681a6d4
  Zach Mueller authored Mar 04, 2024
```
Fully revert atomic checkpointing
```
  1681a6d4
13 Dec, 2023 1 commit

Fix bug with rotating checkpoints (#28009) · 93766251

Zach Mueller authored Dec 13, 2023

* Fix bug

* Write test

* Keep back old modification for grad accum steps

* Whitespace...

* Whitespace again

* Race condition

* Wait for everyone

93766251

05 Sep, 2023 1 commit

Patch with accelerate xpu (#25714) · 70a98024

Abhilash Majumder authored Sep 05, 2023

* patch with accelerate xpu

* patch with accelerate xpu

* formatting

* fix tests

* revert ruff unrelated fixes

* revert ruff unrelated fixes

* revert ruff unrelated fixes

* fix test

* review fixes

* review fixes

* black fixed

* review commits

* review commits

* style fix

* use pytorch_utils

* revert markuplm test

70a98024

01 Sep, 2023 1 commit
- Revert frozen training arguments (#25903) · be0e189b
  Zach Mueller authored Sep 01, 2023
```
* Revert frozen training arguments

* TODO
```
  be0e189b
15 Aug, 2023 1 commit

Make training args fully immutable (#25435) · ca514992

Zach Mueller authored Aug 15, 2023

* Make training args fully immutable

* Working tests, PyTorch

* In test_trainer

* during testing

* Use proper dataclass way

* Fix test

* Another one

* Fix tf

* Lingering slow

* Exception

* Clean

ca514992

24 Jul, 2023 1 commit
- Add dispatch_batches to training arguments (#25038) · 3b734f50
  Zach Mueller authored Jul 24, 2023
```
* Dispatch batches

* Copy items
```
  3b734f50
18 Jul, 2023 1 commit
- add ascend npu accelerator support (#24879) · 9c875839
  statelesshz authored Jul 18, 2023
```
* Add Ascend NPU accelerator support

* fix style warining
```
  9c875839
17 Apr, 2023 1 commit

Introduce `PartialState` as the device handler in the `Trainer` (#22752) · 03462875

Zachary Mueller authored Apr 17, 2023



* Use accelerate for device management

* Add accelerate to setup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

03462875

12 Apr, 2023 1 commit
- [tests] switch to torchrun (#22712) · 1306b7d3
  Stas Bekman authored Apr 12, 2023
  
  1306b7d3
06 Feb, 2023 1 commit

Update quality tooling for formatting (#21480) · 6f79d264

Sylvain Gugger authored Feb 06, 2023

* Result of black 23.1

* Update target to Python 3.7

* Switch flake8 to ruff

* Configure isort

* Configure isort

* Apply isort with line limit

* Put the right black version

* adapt black in check copies

* Fix copies

6f79d264

18 Jan, 2023 1 commit

Add AWS Neuron torchrun support (#20806) · c59d71b2

jeffhataws authored Jan 18, 2023

* Add XLA torchrun support

* Clarify that currently DDP doesn't work with torch.distributed XLA backend yet

* Enable DDP with torchrun and XLA (now available in PT-XLA 1.13)

* Add check for AWS Neuron availability and AWS Neuron specific compiler flag

* Change the new test's name to TestTrainerDistributedNeuronCore

* Remove "assert" and replace raised exception

* Remove compiler flag as it is optional. If needed, will be another PR.

* Use TORCHELASTIC_RUN_ID to determine whether torchrun is used

c59d71b2

23 Feb, 2022 1 commit

[Test refactor 1/5] Per-folder tests reorganization (#15725) · 29c10a41

Lysandre Debut authored Feb 23, 2022



* Per-folder tests reorganization
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

29c10a41

19 Aug, 2021 1 commit
- Update namespaces inside torch.utils.data to the latest. (#13167) · 91ff480e
  Allan Lin authored Aug 19, 2021
```
* Update torch.utils.data namespaces to the latest.

* Format

* Update Dataloader.

* Style
```
  91ff480e
15 Jun, 2021 1 commit
- [testing] ensure concurrent pytest workers use a unique port for torch.dist (#12166) · 6e7cc5cc
  Stas Bekman authored Jun 15, 2021
```
* ensure concurrent pytest workers use a unique port for torch.distributed.launch

* reword
```
  6e7cc5cc
31 Mar, 2021 1 commit

Enforce string-formatting with f-strings (#10980) · acc3bd9d

Sylvain Gugger authored Mar 31, 2021



* First third

* Styling and fix mistake

* Quality

* All the rest

* Treat %s and %d

* typo

* Missing )

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

acc3bd9d

23 Mar, 2021 1 commit
- Fix test_trainer_distributed (#10875) · a735f727
  Sylvain Gugger authored Mar 23, 2021
  
  a735f727
18 Mar, 2021 1 commit
- Fix distributed evaluation (#10795) · 008672e6
  Sylvain Gugger authored Mar 18, 2021
```
* Fix distributed evaluation

* Use logger
```
  008672e6
07 Dec, 2020 1 commit
- Copyright (#8970) · 00aa9dbc
  Sylvain Gugger authored Dec 07, 2020
```
* Add copyright everywhere missing

* Style
```
  00aa9dbc
10 Nov, 2020 1 commit
- using multi_gpu consistently (#8446) · 02bdfc02
  Stas Bekman authored Nov 10, 2020
```
* s|multiple_gpu|multi_gpu|g; s|multigpu|multi_gpu|g'

* doc
```
  02bdfc02
28 Oct, 2020 1 commit

[testing] port test_trainer_distributed to distributed pytest + TestCasePlus enhancements (#8107) · 5423f2a9

Stas Bekman authored Oct 28, 2020



* move the helper code into testing_utils

* port test_trainer_distributed to work with pytest

* improve docs

* simplify notes

* doc

* doc

* style

* doc

* further improvements

* torch might not be available

* real fix

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

5423f2a9

14 Oct, 2020 1 commit

Add predict step accumulation (#7767) · a1d1b332

Sylvain Gugger authored Oct 14, 2020



* Add eval_accumulation_step and clean distributed eval

* Add TPU test

* Add TPU stuff

* Fix arg name

* Fix Seq2SeqTrainer

* Fix total_size

* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Doc and add test to TPU

* Add unit test

* Adapt name
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

a1d1b332

20 Aug, 2020 1 commit

Add tests to Trainer (#6605) · 573bdb0a

Sylvain Gugger authored Aug 20, 2020

* Add tests to Trainer

* Test if removing long breaks everything

* Remove ugly hack

* Fix distributed test

* Use float for number of epochs

573bdb0a

25 Jun, 2020 1 commit

[Tokenization] Fix #5181 - make #5155 more explicit - move back the default... · 27cf1d97

Thomas Wolf authored Jun 25, 2020

[Tokenization] Fix #5181 - make #5155 more explicit - move back the default logging level in tests to WARNING (#5252)

* fix-5181

Padding to max sequence length while truncation to another length was wrong on slow tokenizers

* clean up and fix #5155

* fix XLM test

* Fix tests for Transfo-XL

* logging only above WARNING in tests

* switch slow tokenizers tests in @slow

* fix Marian truncation tokenization test

* style and quality

* make the test a lot faster by limiting the sequence length used in tests

27cf1d97

15 Jun, 2020 1 commit

Make DataCollator a callable (#5015) · 1affde2f

Sylvain Gugger authored Jun 15, 2020



* Make DataCollator a callable

* Update src/transformers/data/data_collator.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

1affde2f

20 May, 2020 1 commit
- Update test_trainer_distributed.py · a3af8e86
  Julien Chaumond authored May 20, 2020
  
  a3af8e86
19 May, 2020 1 commit

Distributed eval: SequentialDistributedSampler + gather all results (#4243) · 5e7fe8b5

Julien Chaumond authored May 18, 2020

* Distributed eval: SequentialDistributedSampler + gather all results

* For consistency only write to disk from world_master

Close https://github.com/huggingface/transformers/issues/4272

* Working distributed eval

* Hook into scripts

* Fix #3721 again

* TPU.mesh_reduce: stay in tensor space

Thanks @jysohn23

* Just a small comment

* whitespace

* torch.hub: pip install packaging

* Add test scenarii

5e7fe8b5