Unverified Commit cfd0ae82 authored by Harry Mellor's avatar Harry Mellor Committed by GitHub
Browse files

Add RLHF document (#14482)


Signed-off-by: default avatarHarry Mellor <19981378+hmellor@users.noreply.github.com>
parent 7caff01a
...@@ -14,13 +14,14 @@ EXAMPLE_DOC_DIR = ROOT_DIR / "docs/source/getting_started/examples" ...@@ -14,13 +14,14 @@ EXAMPLE_DOC_DIR = ROOT_DIR / "docs/source/getting_started/examples"
def fix_case(text: str) -> str: def fix_case(text: str) -> str:
subs = { subs = {
"api": "API", "api": "API",
"Cli": "CLI", "cli": "CLI",
"cpu": "CPU", "cpu": "CPU",
"llm": "LLM", "llm": "LLM",
"tpu": "TPU", "tpu": "TPU",
"aqlm": "AQLM", "aqlm": "AQLM",
"gguf": "GGUF", "gguf": "GGUF",
"lora": "LoRA", "lora": "LoRA",
"rlhf": "RLHF",
"vllm": "vLLM", "vllm": "vLLM",
"openai": "OpenAI", "openai": "OpenAI",
"multilora": "MultiLoRA", "multilora": "MultiLoRA",
......
...@@ -105,6 +105,7 @@ features/compatibility_matrix ...@@ -105,6 +105,7 @@ features/compatibility_matrix
:maxdepth: 1 :maxdepth: 1
training/trl.md training/trl.md
training/rlhf.md
::: :::
......
# Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviours.
vLLM can be used to generate the completions for RLHF. The best way to do this is with libraries like [TRL](https://github.com/huggingface/trl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [verl](https://github.com/volcengine/verl).
See the following basic examples to get started if you don't want to use an existing library:
- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf.html)
- [Training and inference processes are colocated on the same GPUs using Ray](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_colocate.html)
- [Utilities for performing RLHF with vLLM](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_utils.html)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment