docs: add instruction for langchain (#1162)

05a38612 · Massimiliano Pronesti · GitHub · d27f4bae · 05a38612 · 05a38612
Unverified Commit 05a38612 authored Nov 30, 2023 by Massimiliano Pronesti Committed by GitHub Nov 30, 2023
Show whitespace changes
Inline Side-by-side

Showing with 32 additions and 0 deletions

docs/source/index.rst docs/source/index.rst +1 -0

docs/source/serving/serving_with_langchain.rst docs/source/serving/serving_with_langchain.rst +31 -0

No files found.
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -66,6 +66,7 @@ Documentation
   serving/run_on_sky
   serving/deploying_with_triton
   serving/deploying_with_docker
+   serving/serving_with_langchain
 .. toctree::
   :maxdepth: 1

--- a/docs/source/serving/serving_with_langchain.rst
+++ b/docs/source/serving/serving_with_langchain.rst
+.. _run_on_langchain:
+Serving with Langchain
+============================
+vLLM is also available via `Langchain <https://github.com/langchain-ai/langchain>`_ .
+To install langchain, run
+.. code-block:: console
+    $ pip install langchain -q
+To run inference on a single or multiple GPUs, use ``VLLM`` class from ``langchain``.
+.. code-block:: python
+    from langchain.llms import VLLM
+    llm = VLLM(model="mosaicml/mpt-7b",
+               trust_remote_code=True,  # mandatory for hf models
+               max_new_tokens=128,
+               top_k=10,
+               top_p=0.95,
+               temperature=0.8,
+               # tensor_parallel_size=... # for distributed inference
+    )
+    print(llm("What is the capital of France ?"))
+Please refer to this `Tutorial <https://github.com/langchain-ai/langchain/blob/master/docs/extras/integrations/llms/vllm.ipynb>`_ for more details.
\ No newline at end of file