[Qwen3-Next] switch to triton and cache conv states to accelerate MTP from 300...
[Qwen3-Next] switch to triton and cache conv states to accelerate MTP from 300 tok/s to 341 tok/s (#10335)
Co-authored-by:
Binyao Jiang <byjiang1996@gmail.com>
Showing
This diff is collapsed.
Please register or sign in to comment