Add more libraries to rlhf.md (#26374)

Signed-off-by: Michael Goin <mgoin64@gmail.com>

Add more libraries to rlhf.md (#26374)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
1b86bd8e · Michael Goin · GitHub · 59012df9 · 1b86bd8e
Unverified Commit 1b86bd8e authored Oct 07, 2025 by Michael Goin Committed by GitHub Oct 07, 2025
Show whitespace changes
Inline Side-by-side

Showing with 13 additions and 2 deletions

docs/training/rlhf.md docs/training/rlhf.md +13 -2

No files found.
--- a/docs/training/rlhf.md
+++ b/docs/training/rlhf.md
 # Reinforcement Learning from Human Feedback
-Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors.
+Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors. vLLM can be used to generate the completions for RLHF.
-vLLM can be used to generate the completions for RLHF. Some ways to do this include using libraries like [TRL](https://github.com/huggingface/trl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [verl](https://github.com/volcengine/verl) and [unsloth](https://github.com/unslothai/unsloth).
+The following open-source RL libraries use vLLM for fast rollouts (sorted alphabetically and non-exhaustive):
+- [Cosmos-RL](https://github.com/nvidia-cosmos/cosmos-rl)
+- [NeMo-RL](https://github.com/NVIDIA-NeMo/RL)
+- [Open Instruct](https://github.com/allenai/open-instruct)
+- [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)
+- [PipelineRL](https://github.com/ServiceNow/PipelineRL)
+- [Prime-RL](https://github.com/PrimeIntellect-ai/prime-rl)
+- [SkyRL](https://github.com/NovaSky-AI/SkyRL)
+- [TRL](https://github.com/huggingface/trl)
+- [Unsloth](https://github.com/unslothai/unsloth)
+- [verl](https://github.com/volcengine/verl)
 See the following basic examples to get started if you don't want to use an existing library: