Unverified Commit 1b86bd8e authored by Michael Goin's avatar Michael Goin Committed by GitHub
Browse files

Add more libraries to rlhf.md (#26374)


Signed-off-by: default avatarMichael Goin <mgoin64@gmail.com>
parent 59012df9
# Reinforcement Learning from Human Feedback # Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors. Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors. vLLM can be used to generate the completions for RLHF.
vLLM can be used to generate the completions for RLHF. Some ways to do this include using libraries like [TRL](https://github.com/huggingface/trl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [verl](https://github.com/volcengine/verl) and [unsloth](https://github.com/unslothai/unsloth). The following open-source RL libraries use vLLM for fast rollouts (sorted alphabetically and non-exhaustive):
- [Cosmos-RL](https://github.com/nvidia-cosmos/cosmos-rl)
- [NeMo-RL](https://github.com/NVIDIA-NeMo/RL)
- [Open Instruct](https://github.com/allenai/open-instruct)
- [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)
- [PipelineRL](https://github.com/ServiceNow/PipelineRL)
- [Prime-RL](https://github.com/PrimeIntellect-ai/prime-rl)
- [SkyRL](https://github.com/NovaSky-AI/SkyRL)
- [TRL](https://github.com/huggingface/trl)
- [Unsloth](https://github.com/unslothai/unsloth)
- [verl](https://github.com/volcengine/verl)
See the following basic examples to get started if you don't want to use an existing library: See the following basic examples to get started if you don't want to use an existing library:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment