[doc] improve engine doc and add to readme (#1670)

cd0be748 · Byron Hsu · GitHub · 56503d9b · cd0be748 · cd0be748
Unverified Commit cd0be748 authored Oct 14, 2024 by Byron Hsu Committed by GitHub Oct 14, 2024
Show whitespace changes
Inline Side-by-side

Showing with 62 additions and 3 deletions

README.md README.md +34 -0

docs/en/backend.md docs/en/backend.md +28 -3

No files found.
--- a/README.md
+++ b/README.md
@@ -241,6 +241,40 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
 ```
+### Engine Without HTTP Server
+We also provide an inference engine **without a HTTP server**. For example,
+```python
+import sglang as sgl
+def main():
+    prompts = [
+        "Hello, my name is",
+        "The president of the United States is",
+        "The capital of France is",
+        "The future of AI is",
+    ]
+    sampling_params = {"temperature": 0.8, "top_p": 0.95}
+    llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
+    outputs = llm.generate(prompts, sampling_params)
+    for prompt, output in zip(prompts, outputs):
+        print("===============================")
+        print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
+if __name__ == "__main__":
+    main()
+```
+This can be used for:
+1. **Offline Batch Inference**
+2. **Building Custom Servers**
+You can view the full example [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine)
 ### Supported Models
 **Generative Models**

--- a/docs/en/backend.md
+++ b/docs/en/backend.md
@@ -93,14 +93,39 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
 ```
-### SRT Engine: Direct Inference Without HTTP
+### Engine Without HTTP Server
-SGLang provides a direct inference engine **without an HTTP server**. This can be used for:
+We also provide an inference engine **without a HTTP server**. For example,
+```python
+import sglang as sgl
+def main():
+    prompts = [
+        "Hello, my name is",
+        "The president of the United States is",
+        "The capital of France is",
+        "The future of AI is",
+    ]
+    sampling_params = {"temperature": 0.8, "top_p": 0.95}
+    llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
+    outputs = llm.generate(prompts, sampling_params)
+    for prompt, output in zip(prompts, outputs):
+        print("===============================")
+        print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
+if __name__ == "__main__":
+    main()
+```
+This can be used for:
 1. **Offline Batch Inference**
 2. **Building Custom Servers**
-We provide usage examples [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine)
+You can view the full example [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine)
 ### Supported Models