> To tune the size of CPU or disk cache, set `DYN_KVBM_CPU_CACHE_GB` and `DYN_KVBM_DISK_CACHE_GB` accordingly. We only set `DYN_KVBM_CPU_CACHE_GB=20` in both scripts above.
# [DYNAMO] serve an LLM model using KVBM with dynamo
> [!NOTE]
python -m dynamo.vllm \
> `DYN_KVBM_CPU_CACHE_GB` must be set and `DYN_KVBM_DISK_CACHE_GB` is optional.
"Test compared responses before cache reset (with warmup) vs after cache reset (no warmup)."
"Test compared responses before cache reset (with warmup) vs after cache reset (no warmup)."
)
)
...
@@ -893,215 +581,5 @@ class TestDeterminism:
...
@@ -893,215 +581,5 @@ class TestDeterminism:
pytest.skip("No tests were completed - insufficient data")
pytest.skip("No tests were completed - insufficient data")
assert(
assert(
total_failed==0
success_rate>=success_rate_threshold
),f"Model is not deterministic across cache reset: {total_failed} comparisons failed"
),f"Model is not deterministic across cache reset: {total_failed} comparisons failed, success rate {success_rate:.1%} lower than expected {success_rate_threshold*100}%"