- 08 Apr, 2021 2 commits
-
-
Stas Bekman authored
* relocate core integration tests * add sys.path context manager * cleanup * try * try2 * fix path * doc * style * add dep * add 2 more deps
-
Stas Bekman authored
* synced gpus * fix * fix * need to use t5-small for quality tests * notes * complete merge * fix a disappearing std stream problem * start zero3 tests * wip * tune params * sorting out the pre-trained model loading * reworking generate loop wip * wip * style * fix tests * split the tests * refactor tests * wip * parameterized * fix * workout the resume from non-ds checkpoint pass + test * cleanup * remove no longer needed code * split getter/setter functions * complete the docs * suggestions * gpus and their compute capabilities link * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * style * remove invalid paramgd * automatically configure zero3 params that rely on hidden size * make _get_resized_embeddings zero3-aware * add test exercising resize_token_embeddings() * add docstring Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 12 Mar, 2021 1 commit
-
-
Stas Bekman authored
-
- 22 Feb, 2021 1 commit
-
-
Stas Bekman authored
* implement gradient_accumulation_steps support in DeepSpeed integration * typo * cleanup * cleanup
-
- 10 Feb, 2021 1 commit
-
-
Stas Bekman authored
* free up memory at the end of train * rework tests * consistent formatting * correction
-
- 08 Feb, 2021 1 commit
-
-
Stas Bekman authored
-
- 13 Jan, 2021 1 commit
-
-
Stas Bekman authored
* deepspeed integration * style * add test * ds wants to do its own backward * fp16 assert * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * for clarity extract what args are being passed to deepspeed * introduce the concept of self.wrapped_model * s/self.wrapped_model/self.model_wrapped/ * complete transition to self.wrapped_model / self.model * fix * doc * give ds its own init * add custom overrides, handle bs correctly * fix test * clean up model_init logic, fix small bug * complete fix * collapse --deepspeed_config into --deepspeed * style * start adding doc notes * style * implement hf2ds optimizer and scheduler configuration remapping * oops * call get_num_training_steps absolutely when needed * workaround broken auto-formatter * deepspeed_config arg is no longer needed - fixed in deepspeed master * use hf's fp16 args in config * clean * start on the docs * rebase cleanup * finish up --fp16 * clarify the supported stages * big refactor thanks to discovering deepspeed.init_distributed * cleanup * revert fp16 part * add checkpoint-support * more init ds into integrations * extend docs * cleanup * unfix docs * clean up old code * imports * move docs * fix logic * make it clear which file it's referring to * document nodes/gpus * style * wrong format * style * deepspeed handles gradient clipping * easier to read * major doc rewrite * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * docs * switch to AdamW optimizer * style * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * clarify doc Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-