"examples/vscode:/vscode.git/clone" did not exist on "3b1978a3790115490accadb26ce418989b368764"
Unverified Commit 728e175f authored by Chayenne's avatar Chayenne Committed by GitHub
Browse files

Add examples to token-in-token-out for LLM (#4010)

parent 9e1014cf
...@@ -52,7 +52,7 @@ Please consult the documentation below to learn more about the parameters you ma ...@@ -52,7 +52,7 @@ Please consult the documentation below to learn more about the parameters you ma
* `chat_template`: The chat template to use. Deviating from the default might lead to unexpected responses. For multi-modal chat templates, refer to [here](https://docs.sglang.ai/backend/openai_api_vision.html#Chat-Template). * `chat_template`: The chat template to use. Deviating from the default might lead to unexpected responses. For multi-modal chat templates, refer to [here](https://docs.sglang.ai/backend/openai_api_vision.html#Chat-Template).
* `is_embedding`: Set to true to perform [embedding](https://docs.sglang.ai/backend/openai_api_embeddings.html) / [encode](https://docs.sglang.ai/backend/native_api.html#Encode-(embedding-model)) and [reward](https://docs.sglang.ai/backend/native_api.html#Classify-(reward-model)) tasks. * `is_embedding`: Set to true to perform [embedding](https://docs.sglang.ai/backend/openai_api_embeddings.html) / [encode](https://docs.sglang.ai/backend/native_api.html#Encode-(embedding-model)) and [reward](https://docs.sglang.ai/backend/native_api.html#Classify-(reward-model)) tasks.
* `revision`: Adjust if a specific version of the model should be used. * `revision`: Adjust if a specific version of the model should be used.
* `skip_tokenizer_init`: Set to true to provide the tokens to the engine and get the output tokens directly, typically used in RLHF. Please see this [example for reference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/input_ids.py). * `skip_tokenizer_init`: Set to true to provide the tokens to the engine and get the output tokens directly, typically used in RLHF. Please see this [example for reference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/token_in_token_out_llm.py).
* `json_model_override_args`: Override model config with the provided JSON. * `json_model_override_args`: Override model config with the provided JSON.
* `delete_ckpt_after_loading`: Delete the model checkpoint after loading the model. * `delete_ckpt_after_loading`: Delete the model checkpoint after loading the model.
......
""" """
This example demonstrates how to provide tokenized ids as input instead of text prompt This example demonstrates how to provide tokenized ids to LLM as input instead of text prompt, i.e. a token-in-token-out workflow.
""" """
import sglang as sgl import sglang as sgl
...@@ -24,14 +24,13 @@ def main(): ...@@ -24,14 +24,13 @@ def main():
token_ids_list = [tokenizer.encode(prompt) for prompt in prompts] token_ids_list = [tokenizer.encode(prompt) for prompt in prompts]
# Create an LLM. # Create an LLM.
# You can also specify `skip_tokenizer_init=True`, but it requires explicit detokenization at the end llm = sgl.Engine(model_path=MODEL_PATH, skip_tokenizer_init=True)
llm = sgl.Engine(model_path=MODEL_PATH)
outputs = llm.generate(input_ids=token_ids_list, sampling_params=sampling_params) outputs = llm.generate(input_ids=token_ids_list, sampling_params=sampling_params)
# Print the outputs. # Print the outputs.
for prompt, output in zip(prompts, outputs): for prompt, output in zip(prompts, outputs):
print("===============================") print("===============================")
print(f"Prompt: {prompt}\nGenerated Text: {output['text']}") print(f"Prompt: {prompt}\nGenerated token ids: {output['token_ids']}")
# The __main__ condition is necessary here because we use "spawn" to create subprocesses # The __main__ condition is necessary here because we use "spawn" to create subprocesses
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment