-
roikoren755 authored
Default to 'align' mamba cache mode for Mamba-based models when speculative decoding is enabled (#40454) Signed-off-by:Roi Koren <roik@nvidia.com>
f819265a
Default to 'align' mamba cache mode for Mamba-based models when speculative decoding is enabled (#40454)
Signed-off-by:
Roi Koren <roik@nvidia.com>