Initial commit

b84161d1 · jerrrrry · b84161d1
Commit b84161d1 authored May 20, 2025 by jerrrrry
Hide whitespace changes
Inline Side-by-side

Showing with 16 additions and 0 deletions

verl/utils/dataset/README.md verl/utils/dataset/README.md +16 -0

No files found.
--- a/verl/utils/dataset/README.md
+++ b/verl/utils/dataset/README.md
+# Dataset Format
+## RLHF dataset
+We combine all the data sources into a single parquet files. We directly organize the prompt into the chat format so that multi-turn chats can be easily incorporated. In the prompt, we may add instruction following texts to guide the model output the answers in a particular format so that we can extract the answers.
+
+Math problems
+```json
+{
+    "data_source": "openai/gsm8k",
+    "prompt": [{"role": "user", "content": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? Let's think step by step and output the final answer after \"####\""}],
+    "ability": "math",
+    "reward_model": {
+        "style": "rule",
+        "ground_truth": ["72"]
+    },
+}
+```