@@ -22,6 +22,8 @@ If you frequently encounter preemptions from the vLLM engine, consider the follo
...
@@ -22,6 +22,8 @@ If you frequently encounter preemptions from the vLLM engine, consider the follo
You can also monitor the number of preemption requests through Prometheus metrics exposed by the vLLM. Additionally, you can log the cumulative number of preemption requests by setting disable_log_stats=False.
You can also monitor the number of preemption requests through Prometheus metrics exposed by the vLLM. Additionally, you can log the cumulative number of preemption requests by setting disable_log_stats=False.
.. _chunked-prefill:
Chunked Prefill
Chunked Prefill
---------------
---------------
vLLM supports an experimental feature chunked prefill. Chunked prefill allows to chunk large prefills into smaller chunks and batch them together with decode requests.
vLLM supports an experimental feature chunked prefill. Chunked prefill allows to chunk large prefills into smaller chunks and batch them together with decode requests.