• Stas Bekman's avatar
    [DeepSpeed] ZeRO Stage 3 (#10753) · c6d66484
    Stas Bekman authored
    
    
    * synced gpus
    
    * fix
    
    * fix
    
    * need to use t5-small for quality tests
    
    * notes
    
    * complete merge
    
    * fix a disappearing std stream problem
    
    * start zero3 tests
    
    * wip
    
    * tune params
    
    * sorting out the pre-trained model loading
    
    * reworking generate loop wip
    
    * wip
    
    * style
    
    * fix tests
    
    * split the tests
    
    * refactor tests
    
    * wip
    
    * parameterized
    
    * fix
    
    * workout the resume from non-ds checkpoint pass + test
    
    * cleanup
    
    * remove no longer needed code
    
    * split getter/setter functions
    
    * complete the docs
    
    * suggestions
    
    * gpus and their compute capabilities link
    
    * Apply suggestions from code review
    Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
    
    * style
    
    * remove invalid paramgd
    
    * automatically configure zero3 params that rely on hidden size
    
    * make _get_resized_embeddings zero3-aware
    
    * add test exercising resize_token_embeddings()
    
    * add docstring
    Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
    c6d66484
trainer.rst 61.5 KB