Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
977f785d
Unverified
Commit
977f785d
authored
Jan 08, 2025
by
mlmz
Committed by
GitHub
Jan 08, 2025
Browse files
Docs: Rewrite docs for LLama 405B and ModelSpace (#2773)
Co-authored-by:
Chayenne
<
zhaochen20@outlook.com
>
parent
8a690612
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
46 additions
and
43 deletions
+46
-43
docs/backend/server_arguments.md
docs/backend/server_arguments.md
+0
-43
docs/index.rst
docs/index.rst
+2
-0
docs/references/llama_405B.md
docs/references/llama_405B.md
+16
-0
docs/references/modelscope.md
docs/references/modelscope.md
+28
-0
No files found.
docs/backend/server_arguments.md
View file @
977f785d
...
...
@@ -32,46 +32,3 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
```
## Use Models From ModelScope
<details>
<summary>
More
</summary>
To use a model from
[
ModelScope
](
https://www.modelscope.cn
)
, set the environment variable SGLANG_USE_MODELSCOPE.
```
export SGLANG_USE_MODELSCOPE=true
```
Launch
[
Qwen2-7B-Instruct
](
https://www.modelscope.cn/models/qwen/qwen2-7b-instruct
)
Server
```
SGLANG_USE_MODELSCOPE=true python -m sglang.launch_server --model-path qwen/Qwen2-7B-Instruct --port 30000
```
Or start it by docker.
```
bash
docker run
--gpus
all
\
-p
30000:30000
\
-v
~/.cache/modelscope:/root/.cache/modelscope
\
--env
"SGLANG_USE_MODELSCOPE=true"
\
--ipc
=
host
\
lmsysorg/sglang:latest
\
python3
-m
sglang.launch_server
--model-path
Qwen/Qwen2.5-7B-Instruct
--host
0.0.0.0
--port
30000
```
</details>
## Example: Run Llama 3.1 405B
<details>
<summary>
More
</summary>
```
bash
# Run 405B (fp8) on a single node
python
-m
sglang.launch_server
--model-path
meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
--tp
8
# Run 405B (fp16) on two nodes
## on the first node, replace the `172.16.4.52:20000` with your own first node ip address and port
python3
-m
sglang.launch_server
--model-path
meta-llama/Meta-Llama-3.1-405B-Instruct
--tp
16
--nccl-init-addr
172.16.4.52:20000
--nnodes
2
--node-rank
0
## on the first node, replace the `172.16.4.52:20000` with your own first node ip address and port
python3
-m
sglang.launch_server
--model-path
meta-llama/Meta-Llama-3.1-405B-Instruct
--tp
16
--nccl-init-addr
172.16.4.52:20000
--nnodes
2
--node-rank
1
```
</details>
docs/index.rst
View file @
977f785d
...
...
@@ -60,3 +60,5 @@ The core features include:
references/troubleshooting.md
references/faq.md
references/learn_more.md
references/llama_405B.md
references/modelscope.md
docs/references/llama_405B.md
0 → 100644
View file @
977f785d
# Example: Run Llama 3.1 405B
```
bash
# Run 405B (fp8) on a single node
python
-m
sglang.launch_server
--model-path
meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
--tp
8
```
```
bash
# Run 405B (fp16) on two nodes
## on the first node, replace the `172.16.4.52:20000` with your own first node ip address and port
python3
-m
sglang.launch_server
--model-path
meta-llama/Meta-Llama-3.1-405B-Instruct
--tp
16
--nccl-init-addr
172.16.4.52:20000
--nnodes
2
--node-rank
0
## on the first node, replace the `172.16.4.52:20000` with your own first node ip address and port
python3
-m
sglang.launch_server
--model-path
meta-llama/Meta-Llama-3.1-405B-Instruct
--tp
16
--nccl-init-addr
172.16.4.52:20000
--nnodes
2
--node-rank
1
```
docs/references/modelscope.md
0 → 100644
View file @
977f785d
# Use Models From ModelScope
To use a model from
[
ModelScope
](
https://www.modelscope.cn
)
, set the environment variable
`SGLANG_USE_MODELSCOPE`
.
```
bash
export
SGLANG_USE_MODELSCOPE
=
true
```
We take [Qwen2-7B-Instruct](https://www.modelscope.cn/models/qwen/qwen2-7b-instruct) as an example. Launch the Server:
---
```
bash
python
-m
sglang.launch_server
--model-path
qwen/Qwen2-7B-Instruct
--port
30000
```
Or start it by docker:
```
bash
docker run
--gpus
all
\
-p
30000:30000
\
-v
~/.cache/modelscope:/root/.cache/modelscope
\
--env
"SGLANG_USE_MODELSCOPE=true"
\
--ipc
=
host
\
lmsysorg/sglang:latest
\
python3
-m
sglang.launch_server
--model-path
Qwen/Qwen2.5-7B-Instruct
--host
0.0.0.0
--port
30000
```
Note that modelscope uses a different cache directory than huggingface. You may need to set it manually to avoid running out of disk space.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment