README.md 3.31 KB
Newer Older
1
2
3
4
5
6
7
# LMCache Dynamo MMLU Testing Suite

## Overview
Test the correctness of Dynamo integration with LMCache by comparing MMLU benchmark results with and without LMCache enabled.

## Testing Principle
Compare MMLU test results under two configurations:
8
9
- **Baseline Test**: Dynamo without LMCache
- **LMCache Test**: Dynamo with LMCache enabled
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

If both configurations produce the same inference results, it verifies that LMCache functionality is correct.

## Quick Start

### Prerequisites
1. Ensure dynamo and its dependencies are properly installed (i.e. nats and etcd are running)
2. Download MMLU dataset to `data` directory
3. Ensure HuggingFace models are accessible

### Download MMLU Dataset

```bash
cd ./tests/lmcache

# Auto-download and organize data
python3 download_mmlu.py
```

### Run Single Model Test
Change model name in the script to test other models.
```bash
cd ./tests/lmcache

# 1. Baseline test (without LMCache)
./deploy-baseline-dynamo.sh Qwen/Qwen3-0.6B
# Wait for model to load, then run test in another terminal:
python3 mmlu-baseline-dynamo.py --model Qwen/Qwen3-0.6B --number-of-subjects 15
# Stop services with Ctrl+C in the deploy script terminal

# 2. LMCache test (with LMCache enabled)
./deploy-lmcache_enabled-dynamo.sh Qwen/Qwen3-0.6B
# Wait for model to load, then run test in another terminal:
python3 mmlu-lmcache_enabled-dynamo.py --model Qwen/Qwen3-0.6B --number-of-subjects 15
# Stop services with Ctrl+C in the deploy script terminal

# 3. Compare results
python3 summarize_scores_dynamo.py
```

## File Description

### Deployment Scripts
- **`deploy-baseline-dynamo.sh`**: Deploy Dynamo without LMCache (baseline)
- **`deploy-lmcache_enabled-dynamo.sh`**: Deploy Dynamo with LMCache enabled (test)

### Test Scripts
- **`mmlu-baseline-dynamo.py`**: Run MMLU test on baseline Dynamo
- **`mmlu-lmcache_enabled-dynamo.py`**: Run MMLU test on Dynamo with LMCache
- **`summarize_scores_dynamo.py`**: Compare and analyze test results

## Architecture Differences

### Baseline Architecture (deploy-baseline-dynamo.sh)
```
65
HTTP Request → Dynamo Ingress(8000) → Dynamo Worker → Direct Inference
66
67
68
69
```

### LMCache Architecture (deploy-lmcache_enabled-dynamo.sh)
```
70
HTTP Request → Dynamo Ingress(8000) → Dynamo Worker → LMCache-enabled Inference
71
Environment:LMCACHE_CHUNK_SIZE=256
72
73
74
75
76
77
78
79
80
            LMCACHE_LOCAL_CPU=True
            LMCACHE_MAX_LOCAL_CPU_SIZE=1.0
```

## API Format

Test scripts use Dynamo's Chat Completions API:

```bash
81
curl -X POST http://localhost:8000/v1/chat/completions \
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
  -H "Content-Type: application/json" \
  -d '{
    "model": Qwen/Qwen3-0.6B,
    "messages": [{"role": "user", "content": "question content"}],
    "temperature": 0,
    "max_tokens": 3,
    "stream": false,
    "seed": 42
  }'
```


## Result Interpretation

After testing completes, the following files will be generated:
- `dynamo-baseline-{model_name}.jsonl`: Baseline test results
- `dynamo-lmcache-{model_name}.jsonl`: LMCache test results

If the accuracy in both result files is very close (difference < 1%), it indicates LMCache functionality is correct.

## Notes

1. **Determinism guarantee**: All tests use the same seed (42) and zero temperature to ensure reproducible results
2. **Pre-requisites**: Ensure nats and etcd are running.
3. **Sequential execution**: Must stop the first test before starting the second to avoid port conflicts