• Vasudev Gupta's avatar
    BigBird (#10183) · 6dfd0272
    Vasudev Gupta authored
    
    
    * init bigbird
    
    * model.__init__ working, conversion script ready, config updated
    
    * add conversion script
    
    * BigBirdEmbeddings working :)
    
    * slightly update conversion script
    
    * BigBirdAttention working :) ; some bug in layer.output.dense
    
    * add debugger-notebook
    
    * forward() working for BigBirdModel :) ; replaced gelu with gelu_fast
    
    * tf code adapted to torch till rand_attn in bigbird_block_sparse_attention ; till now everything working :)
    
    * BigBirdModel working in block-sparse attention mode :)
    
    * add BigBirdForPreTraining
    
    * small fix
    
    * add tokenizer for BigBirdModel
    
    * fix config & hence modeling
    
    * fix base prefix
    
    * init testing
    
    * init tokenizer test
    
    * pos_embed must be absolute, attn_type=original_full when add_cross_attn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements
    
    * remove position_embedding_type arg
    
    * complete normal tests
    
    * add comments to block sparse attention
    
    * add attn_probs for sliding & global tokens
    
    * create fn for block sparse attn mask creation
    
    * add special tests
    
    * restore pos embed arg
    
    * minor fix
    
    * attn probs update
    
    * make big bird fully gpu friendly
    
    * fix tests
    
    * remove pruning
    
    * correct tokenzier & minor fixes
    
    * update conversion script , remove norm_type
    
    * tokenizer-inference test add
    
    * remove extra comments
    
    * add docs
    
    * save intermediate
    
    * finish trivia_qa conversion
    
    * small update to forward
    
    * correct qa and layer
    
    * better error message
    
    * BigBird QA ready
    
    * fix rebased
    
    * add triva-qa debugger notebook
    
    * qa setup
    
    * fixed till embeddings
    
    * some issue in q/k/v_layer
    
    * fix bug in conversion-script
    
    * fixed till self-attn
    
    * qa fixed except layer norm
    
    * add qa end2end test
    
    * fix gradient ckpting ; other qa test
    
    * speed-up big bird a bit
    
    * hub_id=google
    
    * clean up
    
    * make quality
    
    * speed up einsum with bmm
    
    * finish perf improvements for big bird
    
    * remove wav2vec2 tok
    
    * fix tokenizer
    
    * include docs
    
    * correct docs
    
    * add helper to auto pad block size
    
    * make style
    
    * remove fast tokenizer for now
    
    * fix some
    
    * add pad test
    
    * finish
    
    * fix some bugs
    
    * fix another bug
    
    * fix buffer tokens
    
    * fix comment and merge from master
    
    * add comments
    
    * make style
    
    * commit some suggestions
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Fix typos
    
    * fix some more suggestions
    
    * add another patch
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * fix copies
    
    * another path
    Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
    
    * update
    
    * update nit suggestions
    
    * make style
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
    6dfd0272
README.md 31.6 KB