Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
3de2b1ea
Unverified
Commit
3de2b1ea
authored
Jan 10, 2025
by
Cyrus Leung
Committed by
GitHub
Jan 10, 2025
Browse files
[Doc] Show default pooling method in a table (#11904)
Signed-off-by:
DarkLight1337
<
tlleungac@connect.ust.hk
>
parent
b844b99a
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
45 additions
and
22 deletions
+45
-22
docs/source/models/generative_models.md
docs/source/models/generative_models.md
+4
-4
docs/source/models/pooling_models.md
docs/source/models/pooling_models.md
+41
-18
No files found.
docs/source/models/generative_models.md
View file @
3de2b1ea
...
@@ -8,14 +8,14 @@ In vLLM, generative models implement the {class}`~vllm.model_executor.models.Vll
...
@@ -8,14 +8,14 @@ In vLLM, generative models implement the {class}`~vllm.model_executor.models.Vll
Based on the final hidden states of the input, these models output log probabilities of the tokens to generate,
Based on the final hidden states of the input, these models output log probabilities of the tokens to generate,
which are then passed through {class}
`~vllm.model_executor.layers.Sampler`
to obtain the final text.
which are then passed through {class}
`~vllm.model_executor.layers.Sampler`
to obtain the final text.
For generative models, the only supported
`--task`
option is
`"generate"`
.
Usually, this is automatically inferred so you don't have to specify it.
## Offline Inference
## Offline Inference
The {class}
`~vllm.LLM`
class provides various methods for offline inference.
The {class}
`~vllm.LLM`
class provides various methods for offline inference.
See
[
Engine Arguments
](
#engine-args
)
for a list of options when initializing the model.
See
[
Engine Arguments
](
#engine-args
)
for a list of options when initializing the model.
For generative models, the only supported {code}
`task`
option is {code}
`"generate"`
.
Usually, this is automatically inferred so you don't have to specify it.
### `LLM.generate`
### `LLM.generate`
The {class}
`~vllm.LLM.generate`
method is available to all generative models in vLLM.
The {class}
`~vllm.LLM.generate`
method is available to all generative models in vLLM.
...
@@ -33,7 +33,7 @@ for output in outputs:
...
@@ -33,7 +33,7 @@ for output in outputs:
```
```
You can optionally control the language generation by passing {class}
`~vllm.SamplingParams`
.
You can optionally control the language generation by passing {class}
`~vllm.SamplingParams`
.
For example, you can use greedy sampling by setting
{code}
`temperature=0`
:
For example, you can use greedy sampling by setting
`temperature=0`
:
```
python
```
python
llm
=
LLM
(
model
=
"facebook/opt-125m"
)
llm
=
LLM
(
model
=
"facebook/opt-125m"
)
...
...
docs/source/models/pooling_models.md
View file @
3de2b1ea
...
@@ -14,30 +14,53 @@ As shown in the [Compatibility Matrix](#compatibility-matrix), most vLLM feature
...
@@ -14,30 +14,53 @@ As shown in the [Compatibility Matrix](#compatibility-matrix), most vLLM feature
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
```
```
## Offline Inference
For pooling models, we support the following
`--task`
options.
The selected option sets the default pooler used to extract the final hidden states:
The {class}
`~vllm.LLM`
class provides various methods for offline inference.
See
[
Engine Arguments
](
#engine-args
)
for a list of options when initializing the model.
```
{list-table}
:widths: 50 25 25 25
For pooling models, we support the following {code}
`task`
options:
:header-rows: 1
-
Embedding ({code}
`"embed"`
/ {code}
`"embedding"`
)
* - Task
-
Classification ({code}
`"classify"`
)
- Pooling Type
-
Sentence Pair Scoring ({code}
`"score"`
)
- Normalization
-
Reward Modeling ({code}
`"reward"`
)
- Softmax
* - Embedding (`embed`)
- `LAST`
- ✅︎
- ✗
* - Classification (`classify`)
- `LAST`
- ✗
- ✅︎
* - Sentence Pair Scoring (`score`)
- \*
- \*
- \*
* - Reward Modeling (`reward`)
- `ALL`
- ✗
- ✗
```
The
selected task determines the default {class}
`~vllm.model_executor.layers.Pooler`
that is used:
\*
The
default pooler is always defined by the model.
-
Embedding: Extract only the hidden states corresponding to the last token, and apply normalization.
```
{note}
-
Classification: Extract only the hidden states corresponding to the last token, and apply softmax.
If the model's implementation in vLLM defines its own pooler, the default pooler is set to that instead of the one specified in this table.
-
Sentence Pair Scoring: Extract only the hidden states corresponding to the last token, and apply softmax.
```
-
Reward Modeling: Extract all of the hidden states and return them directly.
When loading
[
Sentence Transformers
](
https://huggingface.co/sentence-transformers
)
models,
When loading
[
Sentence Transformers
](
https://huggingface.co/sentence-transformers
)
models,
we attempt to override the default pooler based on its Sentence Transformers configuration file (
{code}
`modules.json`
).
we attempt to override the default pooler based on its Sentence Transformers configuration file (
`modules.json`
).
You can customize the model's pooling method via the {code}
`override_pooler_config`
option,
```
{tip}
You can customize the model's pooling method via the `--override-pooler-config` option,
which takes priority over both the model's and Sentence Transformers's defaults.
which takes priority over both the model's and Sentence Transformers's defaults.
```
## Offline Inference
The {class}
`~vllm.LLM`
class provides various methods for offline inference.
See
[
Engine Arguments
](
#engine-args
)
for a list of options when initializing the model.
### `LLM.encode`
### `LLM.encode`
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment