Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
3de2b1ea
Unverified
Commit
3de2b1ea
authored
Jan 10, 2025
by
Cyrus Leung
Committed by
GitHub
Jan 10, 2025
Browse files
[Doc] Show default pooling method in a table (#11904)
Signed-off-by:
DarkLight1337
<
tlleungac@connect.ust.hk
>
parent
b844b99a
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
45 additions
and
22 deletions
+45
-22
docs/source/models/generative_models.md
docs/source/models/generative_models.md
+4
-4
docs/source/models/pooling_models.md
docs/source/models/pooling_models.md
+41
-18
No files found.
docs/source/models/generative_models.md
View file @
3de2b1ea
...
...
@@ -8,14 +8,14 @@ In vLLM, generative models implement the {class}`~vllm.model_executor.models.Vll
Based on the final hidden states of the input, these models output log probabilities of the tokens to generate,
which are then passed through {class}
`~vllm.model_executor.layers.Sampler`
to obtain the final text.
For generative models, the only supported
`--task`
option is
`"generate"`
.
Usually, this is automatically inferred so you don't have to specify it.
## Offline Inference
The {class}
`~vllm.LLM`
class provides various methods for offline inference.
See
[
Engine Arguments
](
#engine-args
)
for a list of options when initializing the model.
For generative models, the only supported {code}
`task`
option is {code}
`"generate"`
.
Usually, this is automatically inferred so you don't have to specify it.
### `LLM.generate`
The {class}
`~vllm.LLM.generate`
method is available to all generative models in vLLM.
...
...
@@ -33,7 +33,7 @@ for output in outputs:
```
You can optionally control the language generation by passing {class}
`~vllm.SamplingParams`
.
For example, you can use greedy sampling by setting
{code}
`temperature=0`
:
For example, you can use greedy sampling by setting
`temperature=0`
:
```
python
llm
=
LLM
(
model
=
"facebook/opt-125m"
)
...
...
docs/source/models/pooling_models.md
View file @
3de2b1ea
...
...
@@ -14,30 +14,53 @@ As shown in the [Compatibility Matrix](#compatibility-matrix), most vLLM feature
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
```
## Offline Inference
The {class}
`~vllm.LLM`
class provides various methods for offline inference.
See
[
Engine Arguments
](
#engine-args
)
for a list of options when initializing the model.
For pooling models, we support the following {code}
`task`
options:
-
Embedding ({code}
`"embed"`
/ {code}
`"embedding"`
)
-
Classification ({code}
`"classify"`
)
-
Sentence Pair Scoring ({code}
`"score"`
)
-
Reward Modeling ({code}
`"reward"`
)
For pooling models, we support the following
`--task`
options.
The selected option sets the default pooler used to extract the final hidden states:
```
{list-table}
:widths: 50 25 25 25
:header-rows: 1
* - Task
- Pooling Type
- Normalization
- Softmax
* - Embedding (`embed`)
- `LAST`
- ✅︎
- ✗
* - Classification (`classify`)
- `LAST`
- ✗
- ✅︎
* - Sentence Pair Scoring (`score`)
- \*
- \*
- \*
* - Reward Modeling (`reward`)
- `ALL`
- ✗
- ✗
```
The
selected task determines the default {class}
`~vllm.model_executor.layers.Pooler`
that is used:
\*
The
default pooler is always defined by the model.
-
Embedding: Extract only the hidden states corresponding to the last token, and apply normalization.
-
Classification: Extract only the hidden states corresponding to the last token, and apply softmax.
-
Sentence Pair Scoring: Extract only the hidden states corresponding to the last token, and apply softmax.
-
Reward Modeling: Extract all of the hidden states and return them directly.
```
{note}
If the model's implementation in vLLM defines its own pooler, the default pooler is set to that instead of the one specified in this table.
```
When loading
[
Sentence Transformers
](
https://huggingface.co/sentence-transformers
)
models,
we attempt to override the default pooler based on its Sentence Transformers configuration file (
{code}
`modules.json`
).
we attempt to override the default pooler based on its Sentence Transformers configuration file (
`modules.json`
).
You can customize the model's pooling method via the {code}
`override_pooler_config`
option,
```
{tip}
You can customize the model's pooling method via the `--override-pooler-config` option,
which takes priority over both the model's and Sentence Transformers's defaults.
```
## Offline Inference
The {class}
`~vllm.LLM`
class provides various methods for offline inference.
See
[
Engine Arguments
](
#engine-args
)
for a list of options when initializing the model.
### `LLM.encode`
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment