"vllm/collect_env.py" did not exist on "dc4e3df5c23282b2ebaead95f179c25c9d7ec4d8"
rlhf.md 910 Bytes
Newer Older
Harry Mellor's avatar
Harry Mellor committed
1
2
# Reinforcement Learning from Human Feedback

3
Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors.
Harry Mellor's avatar
Harry Mellor committed
4
5
6
7
8

vLLM can be used to generate the completions for RLHF. The best way to do this is with libraries like [TRL](https://github.com/huggingface/trl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [verl](https://github.com/volcengine/verl).

See the following basic examples to get started if you don't want to use an existing library:

9
10
11
- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](../examples/offline_inference/rlhf.md)
- [Training and inference processes are colocated on the same GPUs using Ray](../examples/offline_inference/rlhf_colocate.md)
- [Utilities for performing RLHF with vLLM](../examples/offline_inference/rlhf_utils.md)