vllm/spec_decode/draft_model_runner.py · f256ebe4df6757d76f1f1642d7e110268a2f8190 · OpenDAS / vllm_cscc · GitLab

Find file Blame History Permalink

Remove hard-dependencies of Speculative decode to CUDA workers (#10587) · 0a71900b
Chendi.Xue authored Nov 26, 2024
```
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
```
0a71900b

draft_model_runner.py 13.6 KB