"integration-tests/vscode:/vscode.git/clone" did not exist on "72ee382dedb9d2988b82ddca1e6d8933088ec707"
Unverified Commit cd0be748 authored by Byron Hsu's avatar Byron Hsu Committed by GitHub
Browse files

[doc] improve engine doc and add to readme (#1670)

parent 56503d9b
...@@ -241,6 +241,40 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct ...@@ -241,6 +241,40 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
``` ```
### Engine Without HTTP Server
We also provide an inference engine **without a HTTP server**. For example,
```python
import sglang as sgl
def main():
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
if __name__ == "__main__":
main()
```
This can be used for:
1. **Offline Batch Inference**
2. **Building Custom Servers**
You can view the full example [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine)
### Supported Models ### Supported Models
**Generative Models** **Generative Models**
......
...@@ -93,14 +93,39 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct ...@@ -93,14 +93,39 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
``` ```
### SRT Engine: Direct Inference Without HTTP ### Engine Without HTTP Server
SGLang provides a direct inference engine **without an HTTP server**. This can be used for: We also provide an inference engine **without a HTTP server**. For example,
```python
import sglang as sgl
def main():
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}
llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
if __name__ == "__main__":
main()
```
This can be used for:
1. **Offline Batch Inference** 1. **Offline Batch Inference**
2. **Building Custom Servers** 2. **Building Custom Servers**
We provide usage examples [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) You can view the full example [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine)
### Supported Models ### Supported Models
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment