# TeaCache Configuration Guide
TeaCache speeds up diffusion model inference by caching transformer computations when consecutive timesteps are similar. This typically provides **1.5x-2.0x speedup** with minimal quality loss.
## Quick Start
Enable TeaCache by setting `cache_backend` to `"tea_cache"`:
```python
from vllm_omni import Omni
from vllm_omni.inputs.data import OmniDiffusionSamplingParams
# Simple configuration - model_type is automatically extracted from pipeline.__class__.__name__
omni = Omni(
model="Qwen/Qwen-Image",
cache_backend="tea_cache",
cache_config={
"rel_l1_thresh": 0.2 # Optional, defaults to 0.2
}
)
outputs = omni.generate(
"A cat sitting on a windowsill",
OmniDiffusionSamplingParams(
num_inference_steps=50,
),
)
```
### Using Environment Variable
You can also enable TeaCache via environment variable:
```bash
export DIFFUSION_CACHE_BACKEND=tea_cache
```
Then initialize without explicitly setting `cache_backend`:
```python
from vllm_omni import Omni
omni = Omni(
model="Qwen/Qwen-Image",
cache_config={"rel_l1_thresh": 0.2} # Optional
)
```
## Online Serving (OpenAI-Compatible)
Enable TeaCache for online serving by passing `--cache-backend tea_cache` when starting the server:
```bash
vllm serve Qwen/Qwen-Image --omni --port 8091 \
--cache-backend tea_cache \
--cache-config '{"rel_l1_thresh": 0.2}'
```
## Configuration Parameters
### `rel_l1_thresh` (float, default: `0.2`)
Controls the balance between speed and quality. Lower values prioritize quality, higher values prioritize speed.
**Recommended values:**
- `0.2` - **~1.5x speedup** with minimal quality loss (recommended)
- `0.4` - **~1.8x speedup** with slight quality loss
- `0.6` - **~2.0x speedup** with noticeable quality loss
- `0.8` - **~2.25x speedup** with significant quality loss
## Examples
### Python API
```python
from vllm_omni import Omni
from vllm_omni.inputs.data import OmniDiffusionSamplingParams
omni = Omni(
model="Qwen/Qwen-Image",
cache_backend="tea_cache",
cache_config={"rel_l1_thresh": 0.2}
)
outputs = omni.generate(
"A cat sitting on a windowsill",
OmniDiffusionSamplingParams(
num_inference_steps=50,
),
)
```
## Performance Tuning
Start with the default `rel_l1_thresh=0.2` and adjust based on your needs:
- **Maximum quality**: Use `0.1-0.2`
- **Balanced**: Use `0.2-0.4` (recommended)
- **Maximum speed**: Use `0.6-0.8` (may reduce quality)
## Troubleshooting
### Quality Degradation
If you notice quality issues, lower the threshold:
```python
cache_config={"rel_l1_thresh": 0.1} # More conservative caching
```
## Supported Models
### ImageGen
| Architecture | Models | Example HF Models |
|--------------|--------|-------------------|
| `QwenImagePipeline` | Qwen-Image | `Qwen/Qwen-Image` |
| `QwenImageEditPipeline` | Qwen-Image-Edit | `Qwen/Qwen-Image-Edit` |
| `QwenImageEditPlusPipeline` | Qwen-Image-Edit-2509 | `Qwen/Qwen-Image-Edit-2509` |
| `QwenImageLayeredPipeline` | Qwen-Image-Layered | `Qwen/Qwen-Image-Layered` |
| `BagelForConditionalGeneration` | BAGEL (DiT-only) | `ByteDance-Seed/BAGEL-7B-MoT` |
### VideoGen
No VideoGen models are supported by TeaCache yet.
### Coming Soon
| Architecture | Models | Example HF Models |
|--------------|--------|-------------------|
| `FluxPipeline` | Flux | - |
| `CogVideoXPipeline` | CogVideoX | - |