Add training doc signposting to TRL (#14439)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Add training doc signposting to TRL (#14439)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
be0b399d · Harry Mellor · GitHub · b8b0ccbd · be0b399d · be0b399d
Unverified Commit be0b399d authored Mar 08, 2025 by Harry Mellor Committed by GitHub Mar 08, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 21 additions and 0 deletions

docs/source/index.md docs/source/index.md +8 -0

docs/source/training/trl.md docs/source/training/trl.md +13 -0

No files found.
--- a/docs/source/index.md
+++ b/docs/source/index.md
@@ -100,6 +100,14 @@ features/compatibility_matrix
 % Details about running vLLM
+:::{toctree}
+:caption: Training
+:maxdepth: 1
+training/trl.md
+:::
 :::{toctree}
 :caption: Inference and Serving
 :maxdepth: 1

--- a/docs/source/training/trl.md
+++ b/docs/source/training/trl.md
+# Transformers Reinforcement Learning
+Transformers Reinforcement Learning (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.
+Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions!
+See the guide [vLLM for fast generation in online methods](https://huggingface.co/docs/trl/main/en/speeding_up_training#vllm-for-fast-generation-in-online-methods) in the TRL documentation for more information.
+:::{seealso}
+For more information on the `use_vllm` flag you can provide to the configs of these online methods, see:
+- [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm)
+- [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm)
+:::