"vscode:/vscode.git/clone" did not exist on "510265472cb216daf7d8e83db6fa03ce48b0f5fc"
Unverified Commit 7a4a5de7 authored by Chauncey's avatar Chauncey Committed by GitHub
Browse files

[Misc] Update outdated note: LMCache now supports chunked prefill (#16697)


Signed-off-by: default avatarchaunceyjiang <chaunceyjiang@gmail.com>
parent c16fb5da
......@@ -37,11 +37,11 @@ def build_llm_with_lmcache():
'{"kv_connector":"LMCacheConnector", "kv_role":"kv_both"}')
# Set GPU memory utilization to 0.8 for an A40 GPU with 40GB
# memory. Reduce the value if your GPU has less memory.
# Note that LMCache is not compatible with chunked prefill for now.
# Note: LMCache supports chunked prefill (see vLLM#14505, LMCache#392).
llm = LLM(model="mistralai/Mistral-7B-Instruct-v0.2",
kv_transfer_config=ktc,
max_model_len=8000,
enable_chunked_prefill=False,
enable_chunked_prefill=True,
gpu_memory_utilization=0.8)
try:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment