-
Nicolas Patry authored
* Saving some VRAM. - 8B on 4xL4 attention=flashdecoding . Before 4.28GB left, After 4.32GB left, so 400MB saved. - Effect not as visible on attention=flashinfer and n_shard=1. I suspect it's linked to the torch allocator. * Adding assertion.
b57f3703