• Spencer Poff's avatar
    Fixing key padding mask during transformer generation · 68dd3e17
    Spencer Poff authored
    Summary:
    https://github.com/pytorch/fairseq/pull/1097 added key padding mask history in TransformerDecoderLayer, but during an edge case where only the current or only the previous key_padding_mask exists, the resulting key_padding_mask is the wrong size.
    
    This diff adds empty columns in such a case to ensure key_padding_mask is a usable size.
    
    Reviewed By: myleott
    
    Differential Revision: D18224313
    
    fbshipit-source-id: c9fb7266baf0a2d79a66704e00a5ea8bd2987ff6
    68dd3e17
test_multihead_attention.py 1.86 KB