Unverified Commit 455c9ccc authored by Lianmin Zheng's avatar Lianmin Zheng Committed by GitHub
Browse files

Update readme (#434)

parent 39191c85
...@@ -326,15 +326,17 @@ response = client.chat.completions.create( ...@@ -326,15 +326,17 @@ response = client.chat.completions.create(
print(response) print(response)
``` ```
In above example, the server uses the chat template specified in the model tokenizer.
You can override the chat template if needed when launching the server: By default, the server uses the chat template specified in the model tokenizer from Hugging Face. It should just work for most official models such as Llama-2/Llama-3.
If needed, you can also override the chat template when launching the server:
``` ```
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template llama-2 python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000 --chat-template llama-2
``` ```
If the chat template you are looking for is missing, you are welcome to contribute it. If the chat template you are looking for is missing, you are welcome to contribute it.
Meanwhile, you can also temporary register your chat template as follows: Meanwhile, you can also temporarily register your chat template as follows:
```json ```json
{ {
......
...@@ -30,7 +30,8 @@ if __name__ == "__main__": ...@@ -30,7 +30,8 @@ if __name__ == "__main__":
response = requests.post( response = requests.post(
url + "/generate", url + "/generate",
json={ json={
"input_ids": [[1,2,3], [1,2,3]], "text": f"{a}, ",
#"input_ids": [[2] * 256] * 196,
"sampling_params": { "sampling_params": {
"temperature": 0, "temperature": 0,
"max_new_tokens": max_new_tokens, "max_new_tokens": max_new_tokens,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment