• Oleksiy Syvokon's avatar
    BERT decoder: Fix causal mask dtype. · ee5de0ba
    Oleksiy Syvokon authored
    PyTorch < 1.3 requires multiplication operands to be of the same type.
    This was violated when using default attention mask (i.e.,
    attention_mask=None in arguments) given BERT in the decoder mode.
    
    In particular, this was breaking Model2Model and made tutorial
    from the quickstart failing.
    ee5de0ba
test_modeling_bert.py 20.1 KB