Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
05b4c398
"vscode:/vscode.git/clone" did not exist on "3de3c13a0fb4d7a35008c26d8b62cfadac6a011a"
Unverified
Commit
05b4c398
authored
Jan 18, 2024
by
Lianmin Zheng
Committed by
GitHub
Jan 18, 2024
Browse files
Document sampling parameters (#45)
parent
dafafe5b
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
94 additions
and
4 deletions
+94
-4
README.md
README.md
+8
-4
docs/sampling_params.md
docs/sampling_params.md
+86
-0
No files found.
README.md
View file @
05b4c398
...
@@ -228,15 +228,19 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
...
@@ -228,15 +228,19 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
Send a request
Send a request
```
```
curl http://localhost:30000/
v1/completions
\
curl http://localhost:30000/
generate
\
-H "Content-Type: application/json" \
-H "Content-Type: application/json" \
-d '{
-d '{
"prompt": "Say this is a test",
"text": "Once upon a time,",
"max_tokens": 16,
"parameters": {
"max_new_tokens": 16,
"temperature": 0
"temperature": 0
}
}'
}'
```
```
Learn more about the argument format
[
here
](
docs/sampling_params.md
)
.
### Additional Arguments
### Additional Arguments
-
Add
`--tp 2`
to enable tensor parallelism.
-
Add
`--tp 2`
to enable tensor parallelism.
```
```
...
...
docs/sampling_params.md
0 → 100644
View file @
05b4c398
## Sampling Parameters of SGLang Runtime
This doc describes the sampling parameters of the SGLang Runtime.
The
`/generate`
endpoint accepts the following arguments in the JSON format.
```
python
class
GenerateReqInput
:
text
:
Union
[
List
[
str
],
str
]
image_data
:
Optional
[
Union
[
List
[
str
],
str
]]
=
None
sampling_params
:
Union
[
List
[
Dict
],
Dict
]
=
None
rid
:
Optional
[
Union
[
List
[
str
],
str
]]
=
None
return_normalized_logprob
:
Optional
[
Union
[
List
[
bool
],
bool
]]
=
None
normalized_logprob_start_len
:
Optional
[
Union
[
List
[
int
],
int
]]
=
None
stream
:
bool
=
False
```
The
`sampling_params`
follows this format
```
python
class
SamplingParams
:
def
__init__
(
self
,
max_new_tokens
:
int
=
16
,
stop
:
Optional
[
Union
[
str
,
List
[
str
]]]
=
None
,
temperature
:
float
=
1.0
,
top_p
:
float
=
1.0
,
top_k
:
int
=
-
1
,
frequency_penalty
:
float
=
0.0
,
presence_penalty
:
float
=
0.0
,
ignore_eos
:
bool
=
False
,
skip_special_tokens
:
bool
=
True
,
dtype
:
Optional
[
str
]
=
None
,
regex
:
Optional
[
str
]
=
None
,
)
->
None
:
```
## Examples
### Normal
```
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
```
```
python
import
requests
response
=
requests
.
post
(
"http://localhost:30000/generate"
,
json
=
{
"text"
:
"The capital of France is"
,
"sampling_params"
:
{
"temperature"
:
0
,
"max_new_tokens"
:
32
,
},
},
)
print
(
response
.
json
())
```
### Streaming
```
python
import
requests
,
json
response
=
requests
.
post
(
"http://localhost:30000/generate"
,
json
=
{
"text"
:
"The capital of France is"
,
"sampling_params"
:
{
"temperature"
:
0
,
"max_new_tokens"
:
256
,
},
"stream"
:
True
,
},
stream
=
True
,
)
prev
=
0
for
chunk
in
response
.
iter_lines
(
decode_unicode
=
False
,
delimiter
=
b
"
\0
"
):
if
chunk
:
data
=
json
.
loads
(
chunk
.
decode
())
output
=
data
[
"text"
].
strip
()
print
(
output
[
prev
:],
end
=
""
,
flush
=
True
)
prev
=
len
(
output
)
print
(
""
)
```
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment