Unverified Commit be0b399d authored by Harry Mellor's avatar Harry Mellor Committed by GitHub
Browse files

Add training doc signposting to TRL (#14439)


Signed-off-by: default avatarHarry Mellor <19981378+hmellor@users.noreply.github.com>
parent b8b0ccbd
...@@ -100,6 +100,14 @@ features/compatibility_matrix ...@@ -100,6 +100,14 @@ features/compatibility_matrix
% Details about running vLLM % Details about running vLLM
:::{toctree}
:caption: Training
:maxdepth: 1
training/trl.md
:::
:::{toctree} :::{toctree}
:caption: Inference and Serving :caption: Inference and Serving
:maxdepth: 1 :maxdepth: 1
......
# Transformers Reinforcement Learning
Transformers Reinforcement Learning (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.
Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions!
See the guide [vLLM for fast generation in online methods](https://huggingface.co/docs/trl/main/en/speeding_up_training#vllm-for-fast-generation-in-online-methods) in the TRL documentation for more information.
:::{seealso}
For more information on the `use_vllm` flag you can provide to the configs of these online methods, see:
- [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm)
- [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm)
:::
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment