Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
f8ef146f
"vscode:/vscode.git/clone" did not exist on "3fb0bfcafabcf3b49c3150bcbee6ca178ca5e4d1"
Unverified
Commit
f8ef146f
authored
Jan 16, 2025
by
Cyrus Leung
Committed by
GitHub
Jan 16, 2025
Browse files
[Doc] Add documentation for specifying model architecture (#12105)
parent
fa0050db
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
53 additions
and
0 deletions
+53
-0
docs/source/serving/offline_inference.md
docs/source/serving/offline_inference.md
+53
-0
No files found.
docs/source/serving/offline_inference.md
View file @
f8ef146f
...
...
@@ -31,6 +31,59 @@ Please refer to the above pages for more details about each API.
This section lists the most common options for running the vLLM engine.
For a full list, refer to the
[
Engine Arguments
](
#engine-args
)
page.
### Model resolution
vLLM loads HuggingFace-compatible models by inspecting the
`architectures`
field in
`config.json`
of the model repository
and finding the corresponding implementation that is registered to vLLM.
Nevertheless, our model resolution may fail for the following reasons:
-
The
`config.json`
of the model repository lacks the
`architectures`
field.
-
Unofficial repositories refer to a model using alternative names which are not recorded in vLLM.
-
The same architecture name is used for multiple models, creating ambiguity as to which model should be loaded.
In those cases, vLLM may throw an error like:
```
text
Traceback (most recent call last):
...
File "vllm/model_executor/models/registry.py", line xxx, in inspect_model_cls
for arch in architectures:
TypeError: 'NoneType' object is not iterable
```
or:
```
text
File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
raise ValueError(
ValueError: Model architectures ['<arch>'] are not supported for now. Supported architectures: [...]
```
:::{note}
The above error is distinct from the following similar but different error:
```
text
File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
raise ValueError(
ValueError: Model architectures ['<arch>'] failed to be inspected. Please check the logs for more details.
```
This error means that vLLM failed to import the model file. Usually, it is related to missing dependencies or outdated
binaries in the vLLM build. Please read the logs carefully to determine the real cause of the error.
:::
To fix this, explicitly specify the model architecture by passing
`config.json`
overrides to the
`hf_overrides`
option.
For example:
```
python
model
=
LLM
(
model
=
"cerebras/Cerebras-GPT-1.3B"
,
hf_overrides
=
{
"architectures"
:
[
"GPT2LMHeadModel"
]},
# GPT-2
)
```
Our
[
list of supported models
](
#supported-models
)
shows the model architectures that are recognized by vLLM.
### Reducing memory usage
Large models might cause your machine to run out of memory (OOM). Here are some options that help alleviate this problem.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment