• Sidd Karamcheti's avatar
    Add Mistral GPT-2 Stability Tweaks (#13573) · 3a8de58c
    Sidd Karamcheti authored
    
    
    * Add layer-wise scaling
    
    * Add reorder & upcasting argument
    
    * Add OpenAI GPT-2 weight initialization scheme
    
    * start `layer_idx` count at zero for consistency
    
    * disentangle attn and reordered and upscaled attn function
    
    * rename `scale_attn_by_layer` to `scale_attn_by_layer_id`
    
    * make autocast from amp compatible with pytorch<1.6
    
    * fix docstring
    
    * style fixes
    
    * Add fixes from PR feedback, style tweaks
    
    * Fix doc whitespace
    
    * Reformat
    
    * First pass scale_attn_by_layer_idx and reorder_and_upcast_attn tests
    
    * Rename scale_attn_by_layer_idx, add tip
    
    * Remove extra newline
    
    * add test for weight initialization
    
    * update code format
    
    * add assert check weights are fp32
    
    * remove assert
    
    * Fix incorrect merge
    
    * Fix shape mismatch in baddbmm
    
    * Add generation test for Mistral flags
    Co-authored-by: default avatarleandro <leandro.vonwerra@spoud.io>
    Co-authored-by: default avatarKeshav Santhanam <keshav2@stanford.edu>
    Co-authored-by: default avatarJ38 <jebolton@stanford.edu>
    3a8de58c
gpt2.rst 6.91 KB