**The `--connector lmcache` flag is required** to enable LMCache in vLLM. Optionally set `ENABLE_LMCACHE=1` to use Dynamo's default LMCache configuration values, or set individual `LMCACHE_*` environment variables for custom configuration.
### Customization
### Environment Variables
LMCache configuration can be customized via environment variables:
# LMCache will use its own defaults (chunk_size=256, local_cpu=True, max_local_cpu_size=5GB)
```
LMCache configuration can be customized via environment variables listed [here](https://docs.lmcache.ai/api_reference/configurations.html).
For advanced configurations, LMCache supports multiple [storage backends](https://docs.lmcache.ai/index.html):
-**CPU RAM**: Fast local memory offloading
...
...
@@ -87,10 +59,6 @@ In aggregated mode, the system uses:
Disaggregated serving separates prefill and decode operations into dedicated workers. This provides better resource utilization and scalability for production deployments.
### Configuration
The same `ENABLE_LMCACHE=1` environment variable enables LMCache, but the system automatically configures different connector setups for prefill and decode workers.
### Deployment
Use the provided disaggregated launch script(the script requires at least 2 GPUs):
...
...
@@ -127,7 +95,7 @@ The system automatically configures KV transfer based on the deployment mode and