-
Pavani Majety authored
[ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation (#18160) Signed-off-by:Pavani Majety <pmajety@nvidia.com>
f2036734
[ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation (#18160)
Signed-off-by:
Pavani Majety <pmajety@nvidia.com>