Add documentation to Triton server tutorial (#983)

6f2dd6c3 · Tanmay Verma · GitHub · bc064457 · 6f2dd6c3 · 6f2dd6c3
Unverified Commit 6f2dd6c3 authored Sep 20, 2023 by Tanmay Verma Committed by GitHub Sep 20, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 0 deletions

docs/source/index.rst docs/source/index.rst +1 -0

docs/source/serving/deploying_with_triton.rst docs/source/serving/deploying_with_triton.rst +6 -0

No files found.
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -64,6 +64,7 @@ Documentation
   serving/distributed_serving
   serving/run_on_sky
+   serving/deploying_with_triton
 .. toctree::
   :maxdepth: 1

--- a/docs/source/serving/deploying_with_triton.rst
+++ b/docs/source/serving/deploying_with_triton.rst
+.. _deploying_with_triton:
+Deploying with NVIDIA Triton
+============================
+The `Triton Inference Server <https://github.com/triton-inference-server>`_ hosts a tutorial demonstrating how to quickly deploy a simple `facebook/opt-125m <https://huggingface.co/facebook/opt-125m>`_ model using vLLM. Please see `Deploying a vLLM model in Triton <https://github.com/triton-inference-server/tutorials/blob/main/Quick_Deploy/vLLM/README.md#deploying-a-vllm-model-in-triton>`_ for more details.