• Stas Bekman's avatar
    [trainer] deepspeed integration (#9211) · 2df34f4a
    Stas Bekman authored
    
    
    * deepspeed integration
    
    * style
    
    * add test
    
    * ds wants to do its own backward
    
    * fp16 assert
    
    * Update src/transformers/training_args.py
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * style
    
    * for clarity extract what args are being passed to deepspeed
    
    * introduce the concept of self.wrapped_model
    
    * s/self.wrapped_model/self.model_wrapped/
    
    * complete transition to self.wrapped_model / self.model
    
    * fix
    
    * doc
    
    * give ds its own init
    
    * add custom overrides, handle bs correctly
    
    * fix test
    
    * clean up model_init logic, fix small bug
    
    * complete fix
    
    * collapse --deepspeed_config into --deepspeed
    
    * style
    
    * start adding doc notes
    
    * style
    
    * implement hf2ds optimizer and scheduler configuration remapping
    
    * oops
    
    * call get_num_training_steps absolutely when needed
    
    * workaround broken auto-formatter
    
    * deepspeed_config arg is no longer needed - fixed in deepspeed master
    
    * use hf's fp16 args in config
    
    * clean
    
    * start on the docs
    
    * rebase cleanup
    
    * finish up --fp16
    
    * clarify the supported stages
    
    * big refactor thanks to discovering deepspeed.init_distributed
    
    * cleanup
    
    * revert fp16 part
    
    * add checkpoint-support
    
    * more init ds into integrations
    
    * extend docs
    
    * cleanup
    
    * unfix docs
    
    * clean up old code
    
    * imports
    
    * move docs
    
    * fix logic
    
    * make it clear which file it's referring to
    
    * document nodes/gpus
    
    * style
    
    * wrong format
    
    * style
    
    * deepspeed handles gradient clipping
    
    * easier to read
    
    * major doc rewrite
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * docs
    
    * switch to AdamW optimizer
    
    * style
    
    * Apply suggestions from code review
    Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
    
    * clarify doc
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
    2df34f4a
trainer.rst 20.8 KB