feat(sampler): 增加 reduced topk+topp 采样快速路径以降低全词表 softmax 开销
新增 VLLM_V1_USE_REDUCED_TOPK_TOPP_SAMPLER 开关并补充适用场景说明 在 V1 GPU 输入批预计算 max_top_k/has_any_no_top_k,native sampler 满足条件时走 reduced fast path,异常自动回退
Showing
Please register or sign in to comment