- 01 Jun, 2021 1 commit
-
-
Stas Bekman authored
* deepspeed docs * cleanup * cleanup
-
- 04 May, 2021 1 commit
-
-
Stas Bekman authored
* document resume randomness * fix link * reword * fix * reword * style
-
- 30 Apr, 2021 1 commit
-
-
Stas Bekman authored
* prep for deepspeed==0.3.16 * new version * too soon * support and test fp32 mode * troubleshooting doc start * workaround no longer needed * add fp32 doc * style * cleanup, add tf32 note * clarify * release was made
-
- 26 Apr, 2021 1 commit
-
-
Stas Bekman authored
* adding Z-inf * revamp config process * up version requirement * wip * massive rewrite * cleanup * cleanup * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * consistent json commas * act on suggestions * leave this feature for 0.3.16 * style Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 21 Apr, 2021 1 commit
-
-
Sylvain Gugger authored
* Base move * Examples reorganization * Update references * Put back test data * Move conftest * More fixes * Move test data to test fixtures * Update path * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Address review comments and clean Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 13 Apr, 2021 1 commit
-
-
Sylvain Gugger authored
* Indent code block * Indent code blocks version 2 * Quality
-
- 09 Apr, 2021 1 commit
-
-
Stas Bekman authored
* typo * style
-
- 08 Apr, 2021 3 commits
-
-
Stas Bekman authored
* make fairscale and deepspeed setup extras * fix default * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * no reason not to ask for the good version * update the CIs Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Stas Bekman authored
* relocate core integration tests * add sys.path context manager * cleanup * try * try2 * fix path * doc * style * add dep * add 2 more deps
-
Stas Bekman authored
* synced gpus * fix * fix * need to use t5-small for quality tests * notes * complete merge * fix a disappearing std stream problem * start zero3 tests * wip * tune params * sorting out the pre-trained model loading * reworking generate loop wip * wip * style * fix tests * split the tests * refactor tests * wip * parameterized * fix * workout the resume from non-ds checkpoint pass + test * cleanup * remove no longer needed code * split getter/setter functions * complete the docs * suggestions * gpus and their compute capabilities link * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * style * remove invalid paramgd * automatically configure zero3 params that rely on hidden size * make _get_resized_embeddings zero3-aware * add test exercising resize_token_embeddings() * add docstring Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 16 Mar, 2021 1 commit
-
-
Cheng Li authored
* pass hf optimizer and scheduler to deepspeed if not specified in ds config * pass hf optimizer and scheduler to deepspeed if not specified in ds config * update * make init_deepspeed support config dict * fix docstring formatting * clean up trainer's comments * add new tests * fix type * composit argparse doesn't work * style * add a new test, rename others * document new functionality * complete tests, add docs * style * correct level * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add new methods to the doc * must tell DS we are using a non-native optimizer * add protection against cpu_offload + HF optimizer combo * fix the cli overrides * sync docs + tests * restore AdamW * better docs * need new version * no longer needed * remove outdate information * refactor duplicated code Co-authored-by:
Stas Bekman <stas@stason.org> Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 15 Mar, 2021 1 commit
-
-
Th茅o Matussi猫re authored
* split seq2seq script, update docs * needless diff * fix readme * remove test diff * s/summarization/translation Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * cr * fix arguments & better mbart/t5 refs * copyright Co-authored-by:
Suraj Patil <surajp815@gmail.com> * reword readme Co-authored-by:
Suraj Patil <surajp815@gmail.com> * s/summarization/translation * short script names * fix tests * fix isort, include mbart doc * delete old script, update tests * automate source prefix * automate source prefix for translation * s/translation/trans Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> * fix script name (short version) * typos Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> * exact parameter Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> * remove superfluous source_prefix calls in docs * rename scripts & warn for source prefix * black * flake8 Co-authored-by:
theo <theo@matussie.re> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Suraj Patil <surajp815@gmail.com> Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com>
-
- 12 Mar, 2021 2 commits
-
-
Stas Bekman authored
-
Sylvain Gugger authored
* Add auto_wrap option in fairscale integration * Style
-
- 10 Mar, 2021 1 commit
-
-
Sylvain Gugger authored
-
- 05 Mar, 2021 1 commit
-
-
lewtun authored
-
- 25 Feb, 2021 1 commit
-
-
Sylvain Gugger authored
* Ass support for ZeRO-2/3 and ZeRO-offload in fairscale * Quality * Rework from review comments * Add doc * Apply suggestions from code review Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> * Address review comments Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com>
-
- 22 Feb, 2021 1 commit
-
-
Stas Bekman authored
* implement gradient_accumulation_steps support in DeepSpeed integration * typo * cleanup * cleanup
-
- 17 Feb, 2021 1 commit
-
-
Stas Bekman authored
-
- 11 Feb, 2021 1 commit
-
-
Stas Bekman authored
* init devices/setup explicitly * docs + test * simplify * cleanup * cleanup * cleanup * correct the required dist setup * derive local_rank from env LOCAL_RANK
-
- 10 Feb, 2021 1 commit
-
-
Stas Bekman authored
* how to specify a specific gpu * new paper * expand on buffer sizes * style * where to find config examples * specific example * small updates
-
- 14 Jan, 2021 1 commit
-
-
Stas Bekman authored
* [doc] install + 1-gpu deployment * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * improvements Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 13 Jan, 2021 1 commit
-
-
Stas Bekman authored
* deepspeed integration * style * add test * ds wants to do its own backward * fp16 assert * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * for clarity extract what args are being passed to deepspeed * introduce the concept of self.wrapped_model * s/self.wrapped_model/self.model_wrapped/ * complete transition to self.wrapped_model / self.model * fix * doc * give ds its own init * add custom overrides, handle bs correctly * fix test * clean up model_init logic, fix small bug * complete fix * collapse --deepspeed_config into --deepspeed * style * start adding doc notes * style * implement hf2ds optimizer and scheduler configuration remapping * oops * call get_num_training_steps absolutely when needed * workaround broken auto-formatter * deepspeed_config arg is no longer needed - fixed in deepspeed master * use hf's fp16 args in config * clean * start on the docs * rebase cleanup * finish up --fp16 * clarify the supported stages * big refactor thanks to discovering deepspeed.init_distributed * cleanup * revert fp16 part * add checkpoint-support * more init ds into integrations * extend docs * cleanup * unfix docs * clean up old code * imports * move docs * fix logic * make it clear which file it's referring to * document nodes/gpus * style * wrong format * style * deepspeed handles gradient clipping * easier to read * major doc rewrite * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * docs * switch to AdamW optimizer * style * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * clarify doc Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 22 Dec, 2020 1 commit
-
-
Sylvain Gugger authored
* Add label smoothing in Trainer * Add options for scheduler and Adafactor in Trainer * Put Seq2SeqTrainer in the main lib * Apply suggestions from code review Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * Address review comments and adapt scripts * Documentation * Move test not using script to tests folder Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com>
-
- 07 Dec, 2020 1 commit
-
-
Sylvain Gugger authored
* Add copyright everywhere missing * Style
-
- 12 Nov, 2020 1 commit
-
-
Chengxi Guo authored
* fix doc bug Signed-off-by:
mymusise <mymusise1@gmail.com> * fix example bug Signed-off-by:
mymusise <mymusise1@gmail.com>
-
- 26 Oct, 2020 1 commit
-
-
Sylvain Gugger authored
* Important files * Styling them all * Revert "Styling them all" This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e. * Syling them for realsies * Fix syntax error * Fix benchmark_utils * More fixes * Fix modeling auto and script * Remove new line * Fixes * More fixes * Fix more files * Style * Add FSMT * More fixes * More fixes * More fixes * More fixes * Fixes * More fixes * More fixes * Last fixes * Make sphinx happy
-
- 13 Oct, 2020 1 commit
-
-
Tiger authored
-
- 07 Oct, 2020 1 commit
-
-
Sylvain Gugger authored
* Initial callback proposal * Finish various callbacks * Post-rebase conflicts * Fix tests * Don't use something that's not set * Documentation * Remove unwanted print. * Document all models can work * Add tests + small fixes * Update docs/source/internal/trainer_utils.rst Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Address review comments * Fix TF tests * Real fix this time * This one should work * Fix typo * Really fix typo Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 23 Sep, 2020 1 commit
-
-
Sylvain Gugger authored
* Clean up model documentation * Formatting * Preparation work * Long lines * Main work on rst files * Cleanup all config files * Syntax fix * Clean all tokenizers * Work on first models * Models beginning * FaluBERT * All PyTorch models * All models * Long lines again * Fixes * More fixes * Update docs/source/model_doc/bert.rst Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Update docs/source/model_doc/electra.rst Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Last fixes Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 11 Sep, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 31 Jul, 2020 1 commit
-
-
Sylvain Gugger authored
* Harmonize both Trainers API * Fix test * main_prcess -> process_zero
-
- 30 Jun, 2020 1 commit
-
-
Sylvain Gugger authored
* Documentation for the Trainer API * Address review comments * Address comments
-