• dan_the_3rd's avatar
    [ft_attention] Fix for seqlen=8136 (#488) · c3f2a632
    dan_the_3rd authored
    When seqlen=8136, `smem_sz = 48840`, and apparently starting the kernel returns an `invalid argument` CUDA error.
    
    `48840 < 48 * 1024` but apparently it's still above the limit somehow..?
    Tested on A100
    c3f2a632
decoder_masked_multihead_attention.cu 6.67 KB