Unverified Commit 6f2dd6c3 authored by Tanmay Verma's avatar Tanmay Verma Committed by GitHub
Browse files

Add documentation to Triton server tutorial (#983)

parent bc064457
...@@ -64,6 +64,7 @@ Documentation ...@@ -64,6 +64,7 @@ Documentation
serving/distributed_serving serving/distributed_serving
serving/run_on_sky serving/run_on_sky
serving/deploying_with_triton
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
......
.. _deploying_with_triton:
Deploying with NVIDIA Triton
============================
The `Triton Inference Server <https://github.com/triton-inference-server>`_ hosts a tutorial demonstrating how to quickly deploy a simple `facebook/opt-125m <https://huggingface.co/facebook/opt-125m>`_ model using vLLM. Please see `Deploying a vLLM model in Triton <https://github.com/triton-inference-server/tutorials/blob/main/Quick_Deploy/vLLM/README.md#deploying-a-vllm-model-in-triton>`_ for more details.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment