• Suraj Patil's avatar
    GPT Neo (#10848) · 86026437
    Suraj Patil authored
    
    
    * lets begin
    
    * boom boom
    
    * fix out proj in attn
    
    * fix attention
    
    * fix local attention
    
    * add tokenizer
    
    * fix imports
    
    * autotokenizer
    
    * fix checkpoint name
    
    * cleanup
    
    * more clean-up
    
    * more cleanup
    
    * output attentions
    
    * fix attn mask creation
    
    * fix imports
    
    * config doc
    
    * add tests
    
    * add slow tests
    
    * quality
    
    * add conversion script
    
    * copyright
    
    * typo
    
    * another bites the dust
    
    * fix attention tests
    
    * doc
    
    * add embed init in convert function
    
    * fix copies
    
    * remove tokenizer
    
    * enable caching
    
    * address review comments
    
    * improve config and create attn layer list internally
    
    * more consistent naming
    
    * init hf config from mesh-tf config json file
    
    * remove neo tokenizer from doc
    
    * handle attention_mask in local attn layer
    
    * attn_layers => attention_layers
    
    * add tokenizer_class in config
    
    * fix docstring
    
    * raise if len of attention_layers is not same as num_layers
    
    * remove tokenizer_class from config
    
    * more consistent naming
    
    * fix doc
    
    * fix checkpoint names
    
    * fp16 compat
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    86026437
modeling_auto.py 83 KB