Add RLHF document (#14482)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Add RLHF document (#14482)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
cfd0ae82 · Harry Mellor · GitHub · 7caff01a · cfd0ae82 · cfd0ae82
Unverified Commit cfd0ae82 authored Mar 08, 2025 by Harry Mellor Committed by GitHub Mar 08, 2025
Showing with 14 additions and 1 deletion

docs/source/generate_examples.py docs/source/generate_examples.py +2 -1

docs/source/index.md docs/source/index.md +1 -0

docs/source/training/rlhf.md docs/source/training/rlhf.md +11 -0

No files found.
--- a/docs/source/generate_examples.py
+++ b/docs/source/generate_examples.py
@@ -14,13 +14,14 @@ EXAMPLE_DOC_DIR = ROOT_DIR / "docs/source/getting_started/examples"
 def fix_case(text: str) -> str:
    subs = {
        "api": "API",
-        "Cli": "CLI",
+        "cli": "CLI",
        "cpu": "CPU",
        "llm": "LLM",
        "tpu": "TPU",
        "aqlm": "AQLM",
        "gguf": "GGUF",
        "lora": "LoRA",
+        "rlhf": "RLHF",
        "vllm": "vLLM",
        "openai": "OpenAI",
        "multilora": "MultiLoRA",

--- a/docs/source/index.md
+++ b/docs/source/index.md
@@ -105,6 +105,7 @@ features/compatibility_matrix
 :maxdepth: 1
 training/trl.md
+training/rlhf.md
 :::

--- a/docs/source/training/rlhf.md
+++ b/docs/source/training/rlhf.md
+# Reinforcement Learning from Human Feedback
+Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviours.
+vLLM can be used to generate the completions for RLHF. The best way to do this is with libraries like [TRL](https://github.com/huggingface/trl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [verl](https://github.com/volcengine/verl).
+See the following basic examples to get started if you don't want to use an existing library:
+- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf.html)
+- [Training and inference processes are colocated on the same GPUs using Ray](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_colocate.html)
+- [Utilities for performing RLHF with vLLM](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_utils.html)