- 13 May, 2024 1 commit
-
-
fxmarty authored
* update to ROCm 6.0.2 and test MI300 * add callers for mi300 * update dockerfile * fix trainer tests * remove apex * style * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * update to torch 2.3 * add workflow dispatch target * we may need branches: mi300-ci after all * nit * fix docker build * nit * add check runner * remove docker-gpu * fix issues * fix --------- Co-authored-by:
Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
- 04 Mar, 2024 1 commit
-
-
Zach Mueller authored
Fully revert atomic checkpointing
-
- 13 Dec, 2023 1 commit
-
-
Zach Mueller authored
* Fix bug * Write test * Keep back old modification for grad accum steps * Whitespace... * Whitespace again * Race condition * Wait for everyone
-
- 05 Sep, 2023 1 commit
-
-
Abhilash Majumder authored
* patch with accelerate xpu * patch with accelerate xpu * formatting * fix tests * revert ruff unrelated fixes * revert ruff unrelated fixes * revert ruff unrelated fixes * fix test * review fixes * review fixes * black fixed * review commits * review commits * style fix * use pytorch_utils * revert markuplm test
-
- 01 Sep, 2023 1 commit
-
-
Zach Mueller authored
* Revert frozen training arguments * TODO
-
- 15 Aug, 2023 1 commit
-
-
Zach Mueller authored
* Make training args fully immutable * Working tests, PyTorch * In test_trainer * during testing * Use proper dataclass way * Fix test * Another one * Fix tf * Lingering slow * Exception * Clean
-
- 24 Jul, 2023 1 commit
-
-
Zach Mueller authored
* Dispatch batches * Copy items
-
- 18 Jul, 2023 1 commit
-
-
statelesshz authored
* Add Ascend NPU accelerator support * fix style warining
-
- 17 Apr, 2023 1 commit
-
-
Zachary Mueller authored
* Use accelerate for device management * Add accelerate to setup Co-authored-by:Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 12 Apr, 2023 1 commit
-
-
Stas Bekman authored
-
- 06 Feb, 2023 1 commit
-
-
Sylvain Gugger authored
* Result of black 23.1 * Update target to Python 3.7 * Switch flake8 to ruff * Configure isort * Configure isort * Apply isort with line limit * Put the right black version * adapt black in check copies * Fix copies
-
- 18 Jan, 2023 1 commit
-
-
jeffhataws authored
* Add XLA torchrun support * Clarify that currently DDP doesn't work with torch.distributed XLA backend yet * Enable DDP with torchrun and XLA (now available in PT-XLA 1.13) * Add check for AWS Neuron availability and AWS Neuron specific compiler flag * Change the new test's name to TestTrainerDistributedNeuronCore * Remove "assert" and replace raised exception * Remove compiler flag as it is optional. If needed, will be another PR. * Use TORCHELASTIC_RUN_ID to determine whether torchrun is used
-
- 23 Feb, 2022 1 commit
-
-
Lysandre Debut authored
* Per-folder tests reorganization Co-authored-by:
sgugger <sylvain.gugger@gmail.com> Co-authored-by:
Stas Bekman <stas@stason.org>
-
- 19 Aug, 2021 1 commit
-
-
Allan Lin authored
* Update torch.utils.data namespaces to the latest. * Format * Update Dataloader. * Style
-
- 15 Jun, 2021 1 commit
-
-
Stas Bekman authored
* ensure concurrent pytest workers use a unique port for torch.distributed.launch * reword
-
- 31 Mar, 2021 1 commit
-
-
Sylvain Gugger authored
* First third * Styling and fix mistake * Quality * All the rest * Treat %s and %d * typo * Missing ) * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 23 Mar, 2021 1 commit
-
-
Sylvain Gugger authored
-
- 18 Mar, 2021 1 commit
-
-
Sylvain Gugger authored
* Fix distributed evaluation * Use logger
-
- 07 Dec, 2020 1 commit
-
-
Sylvain Gugger authored
* Add copyright everywhere missing * Style
-
- 10 Nov, 2020 1 commit
-
-
Stas Bekman authored
* s|multiple_gpu|multi_gpu|g; s|multigpu|multi_gpu|g' * doc
-
- 28 Oct, 2020 1 commit
-
-
Stas Bekman authored
* move the helper code into testing_utils * port test_trainer_distributed to work with pytest * improve docs * simplify notes * doc * doc * style * doc * further improvements * torch might not be available * real fix * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 14 Oct, 2020 1 commit
-
-
Sylvain Gugger authored
* Add eval_accumulation_step and clean distributed eval * Add TPU test * Add TPU stuff * Fix arg name * Fix Seq2SeqTrainer * Fix total_size * Update src/transformers/trainer_pt_utils.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Doc and add test to TPU * Add unit test * Adapt name Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 20 Aug, 2020 1 commit
-
-
Sylvain Gugger authored
* Add tests to Trainer * Test if removing long breaks everything * Remove ugly hack * Fix distributed test * Use float for number of epochs
-
- 25 Jun, 2020 1 commit
-
-
Thomas Wolf authored
[Tokenization] Fix #5181 - make #5155 more explicit - move back the default logging level in tests to WARNING (#5252) * fix-5181 Padding to max sequence length while truncation to another length was wrong on slow tokenizers * clean up and fix #5155 * fix XLM test * Fix tests for Transfo-XL * logging only above WARNING in tests * switch slow tokenizers tests in @slow * fix Marian truncation tokenization test * style and quality * make the test a lot faster by limiting the sequence length used in tests
-
- 15 Jun, 2020 1 commit
-
-
Sylvain Gugger authored
* Make DataCollator a callable * Update src/transformers/data/data_collator.py Co-authored-by:Julien Chaumond <chaumond@gmail.com>
-
- 20 May, 2020 1 commit
-
-
Julien Chaumond authored
-
- 19 May, 2020 1 commit
-
-
Julien Chaumond authored
* Distributed eval: SequentialDistributedSampler + gather all results * For consistency only write to disk from world_master Close https://github.com/huggingface/transformers/issues/4272 * Working distributed eval * Hook into scripts * Fix #3721 again * TPU.mesh_reduce: stay in tensor space Thanks @jysohn23 * Just a small comment * whitespace * torch.hub: pip install packaging * Add test scenarii
-