**The `--connector lmcache` flag is required** to enable LMCache in vLLM. Optionally set `ENABLE_LMCACHE=1` to use Dynamo's default LMCache configuration values, or set individual `LMCACHE_*` environment variables for custom configuration.
### Customization
### Environment Variables
LMCache configuration can be customized via environment variables listed [here](https://docs.lmcache.ai/api_reference/configurations.html).
LMCache configuration can be customized via environment variables:
# LMCache will use its own defaults (chunk_size=256, local_cpu=True, max_local_cpu_size=5GB)
```
For advanced configurations, LMCache supports multiple [storage backends](https://docs.lmcache.ai/index.html):
For advanced configurations, LMCache supports multiple [storage backends](https://docs.lmcache.ai/index.html):
-**CPU RAM**: Fast local memory offloading
-**CPU RAM**: Fast local memory offloading
...
@@ -87,10 +59,6 @@ In aggregated mode, the system uses:
...
@@ -87,10 +59,6 @@ In aggregated mode, the system uses:
Disaggregated serving separates prefill and decode operations into dedicated workers. This provides better resource utilization and scalability for production deployments.
Disaggregated serving separates prefill and decode operations into dedicated workers. This provides better resource utilization and scalability for production deployments.
### Configuration
The same `ENABLE_LMCACHE=1` environment variable enables LMCache, but the system automatically configures different connector setups for prefill and decode workers.
### Deployment
### Deployment
Use the provided disaggregated launch script(the script requires at least 2 GPUs):
Use the provided disaggregated launch script(the script requires at least 2 GPUs):
...
@@ -127,7 +95,7 @@ The system automatically configures KV transfer based on the deployment mode and
...
@@ -127,7 +95,7 @@ The system automatically configures KV transfer based on the deployment mode and