Unverified Commit c9ff9e6f authored by William Song's avatar William Song Committed by GitHub
Browse files

[Docs] add the parallel sampling usage in LLMEngine and AsyncLLM (#24222)

parent eaffe448
......@@ -81,7 +81,13 @@ class SamplingParams(
"""
n: int = 1
"""Number of output sequences to return for the given prompt."""
"""Number of outputs to return for the given prompt request.
NOTE:
`AsyncLLM` streams outputs by default. When `n > 1`, all `n` outputs
are generated and streamed cumulatively per request. To see all `n`
outputs upon completion, use `output_kind=RequestOutputKind.FINAL_ONLY`
in `SamplingParams`."""
best_of: Optional[int] = None
"""Number of output sequences that are generated from the prompt. From
these `best_of` sequences, the top `n` sequences are returned. `best_of`
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment