Default to 'align' mamba cache mode for Mamba-based models when speculative...
Default to 'align' mamba cache mode for Mamba-based models when speculative decoding is enabled (#40454)
Signed-off-by:
Roi Koren <roik@nvidia.com>
Showing
Please register or sign in to comment