• Daniel Stancl's avatar
    Fix cross-attention head mask for Torch encoder-decoder models (#10605) · e3ff165a
    Daniel Stancl authored
    * Fix cross-attention head mask for Torch BART models
    
    * Fix head masking for cross-attention module for the following
    models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart,
    Pegasus
    
    * Enable test_headmasking for M2M_100 model
    
    * Fix cross_head_mask for FSMT, LED and T5
    
    * This commit fixes `head_mask` for cross-attention modules
    in the following models: FSMT, LED, T5
    
    * It also contains some smaller changes in doc so that
    it is be perfectly clear the shape of `cross_head_mask`
    is the same as of `decoder_head_mask`
    
    * Update template
    
    * Fix template for BartForCausalLM
    
    * Fix cross_head_mask for Speech2Text models
    
    * Fix cross_head_mask in templates
    
    * Fix args order in BartForCausalLM template
    
    * Fix doc in BART templates
    
    * Make more explicit naming
    
    * `cross_head_mask` -> `cross_attn_head_mask`
    
    * `cross_layer_head_mask` -> `cross_attn_layer_head_mask`
    
    * Fix doc
    
    * make style quality
    
    * Fix speech2text docstring
    e3ff165a
test_modeling_blenderbot.py 21 KB