Add chat doc in quick start (#21213)

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Add chat doc in quick start (#21213)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
83f7bbb3 · TankNee · GitHub · b5dfb94f · 83f7bbb3
Unverified Commit 83f7bbb3 authored Aug 03, 2025 by TankNee Committed by GitHub Aug 03, 2025
Show whitespace changes
Inline Side-by-side

Showing with 37 additions and 0 deletions

docs/getting_started/quickstart.md docs/getting_started/quickstart.md +37 -0

No files found.
--- a/docs/getting_started/quickstart.md
+++ b/docs/getting_started/quickstart.md
@@ -98,6 +98,43 @@ for output in outputs:
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 ```

+!!! note
+    The `llm.generate` method does not automatically apply the model's chat template to the input prompt. Therefore, if you are using an Instruct model or Chat model, you should manually apply the corresponding chat template to ensure the expected behavior. Alternatively, you can use the `llm.chat` method and pass a list of messages which have the same format as those passed to OpenAI's `client.chat.completions`:
+
+    ??? code
+    
+        ```python
+        # Using tokenizer to apply chat template
+        from transformers import AutoTokenizer
+    
+        tokenizer = AutoTokenizer.from_pretrained("/path/to/chat_model")
+        messages_list = [
+            [{"role": "user", "content": prompt}]
+            for prompt in prompts
+        ]
+        texts = tokenizer.apply_chat_template(
+            messages_list,
+            tokenize=False,
+            add_generation_prompt=True,
+        )
+        
+        # Generate outputs
+        outputs = llm.generate(texts, sampling_params)
+        
+        # Print the outputs.
+        for output in outputs:
+            prompt = output.prompt
+            generated_text = output.outputs[0].text
+            print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+    
+        # Using chat interface.
+        outputs = llm.chat(messages_list, sampling_params)
+        for idx, output in enumerate(outputs):
+            prompt = prompts[idx]
+            generated_text = output.outputs[0].text
+            print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+        ```
+
 [](){ #quickstart-online }

 ## OpenAI-Compatible Server