baselines.md

# Baselines

Environment: [hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.3-flashinfer0.2.2-cxx11abi0](https://hub.docker.com/layers/hiyouga/verl/ngc-th2.6.0-cu126-vllm0.8.3-flashinfer0.2.2-cxx11abi0/images/sha256-335ed6cd1fe73090e458409cfa4394d6abf4cd0503ca44dbafdc28ff72e5ed20)

EasyR1 version: [v0.3.0](https://github.com/hiyouga/EasyR1/tree/v0.3.0)

Welcome to contribute new data points!

## Algorithm Baselines

### [Qwen2.5-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on [Math12k](https://huggingface.co/datasets/hiyouga/math12k)

| Size | Algorithm   | Bits | LR   | KL   | Test Score |
| ---- | ----------- | ---- | ---- | ---- | ---------- |
| 7B   | GRPO        | AMP  | 1e-6 | 1e-2 | 0.73->0.79 |

### [Qwen2.5-VL-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on [Geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k)

| Size | Algorithm   | Bits | LR   | KL   | Test Score |
| ---- | ----------- | ---- | ---- | ---- | ---------- |
| 7B   | GRPO        | AMP  | 1e-6 | 1e-2 | 0.39->0.52 |
| 7B   | GRPO        | BF16 | 1e-6 | 1e-2 | 0.39->0.52 |
| 7B   | GRPO        | AMP  | 1e-6 | 1e-3 | 0.39->0.52 |
| 7B   | RLOO        | AMP  | 1e-6 | 1e-2 | 0.39->0.53 |
| 3B   | GRPO        | AMP  | 1e-6 | 1e-2 | 0.27->0.44 |
| 32B  | GRPO        | BF16 | 1e-6 | 1e-2 | 0.46->0.61 |

> [!NOTE]
> The hyper-parameters not listed are all the same as the default values.

## Performance Baselines

### [Qwen2.5-VL-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on [Geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k)

| Size | GPU Type      | Bits | Batch Size | vLLM Util | vLLM TP | Peak Mem | Peak VRAM | Throughput | Sec per step | Actor MFU |
| ---- | ------------- | ---- | ---------- | --------- | ------- | -------- | --------- | ---------- | ------------ | --------- |
| 3B   | 8 * H100 80GB | AMP  | 4 / 16     | 0.6       | 2       | 120GB    | 35GB      | 1200       | 180s         | 6.3%      |
| 7B   | 8 * H100 80GB | AMP  | 4 / 16     | 0.6       | 2       | 140GB    | 60GB      | 1200       | 180s         | 13.6%     |
| 7B   | 8 * H100 80GB | AMP  | 10 / 20    | 0.6       | 2       | 150GB    | 75GB      | 1400       | 170s         | 19.2%     |
| 7B   | 8 * L20 48GB  | AMP  | 4 / 16     | 0.6       | 2       | 150GB    | 44GB      | 410        | 580s         | 26.5%     |
| 7B   | 8 * H100 80GB | BF16 | 4 / 16     | 0.6       | 2       | 150GB    | 50GB      | 1280       | 190s         | 13.9%     |
| 32B  | 8 * H100 80GB | BF16 | 1 / 8      | 0.6       | 8       | 240GB    | 68GB      | 360        | 860s         | 11.2%     |

- Batch Size: micro_batch_size_per_device_for_update / micro_batch_size_per_device_for_experience
- vLLM Util: rollout.gpu_memory_utilization
- vLLM TP: rollout.tensor_parallel_size
- Peak Mem: Peak CPU memory usage
- Peak VRAM: Peak GPU memory usage
- Throughput: Number of tokens per second per GPU by one training step
- Sec per step: Average time per step in seconds

> [!NOTE]
> The hyper-parameters not listed are all the same as the default values.