Support flash attention 2 with causal masking when KV's seq length is longer...
Support flash attention 2 with causal masking when KV's seq length is longer than Q's seq length. (#436)
Showing
Please register or sign in to comment
Support flash attention 2 with causal masking when KV's seq length is longer than Q's seq length. (#436)