• Suraj Patil's avatar
    Add caching mechanism to BERT, RoBERTa (#9183) · 88ef8893
    Suraj Patil authored
    * add past_key_values
    
    * add use_cache option
    
    * make mask before cutting ids
    
    * adjust position_ids according to past_key_values
    
    * flatten past_key_values
    
    * fix positional embeds
    
    * fix _reorder_cache
    
    * set use_cache to false when not decoder, fix attention mask init
    
    * add test for caching
    
    * add past_key_values for Roberta
    
    * fix position embeds
    
    * add caching test for roberta
    
    * add doc
    
    * make style
    
    * doc, fix attention mask, test
    
    * small fixes
    
    * adress patrick's comments
    
    * input_ids shouldn't start with pad token
    
    * use_cache only when decoder
    
    * make consistent with bert
    
    * make copies consistent
    
    * add use_cache to encoder
    
    * add past_key_values to tapas attention
    
    * apply suggestions from code review
    
    * make coppies consistent
    
    * add attn mask in tests
    
    * remove copied from longformer
    
    * apply suggestions from code review
    
    * fix bart test
    
    * nit
    
    * simplify model outputs
    
    * fix doc
    
    * fix output ordering
    88ef8893
test_modeling_roberta.py 20.6 KB