Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
norm
vllm
Commits
30fb0956
Unverified
Commit
30fb0956
authored
Dec 17, 2023
by
Woosuk Kwon
Committed by
GitHub
Dec 17, 2023
Browse files
[Minor] Add more detailed explanation on `quantization` argument (#2145)
parent
3a765bd5
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
10 additions
and
4 deletions
+10
-4
vllm/engine/arg_utils.py
vllm/engine/arg_utils.py
+6
-1
vllm/entrypoints/llm.py
vllm/entrypoints/llm.py
+4
-3
No files found.
vllm/engine/arg_utils.py
View file @
30fb0956
...
@@ -183,7 +183,12 @@ class EngineArgs:
...
@@ -183,7 +183,12 @@ class EngineArgs:
type
=
str
,
type
=
str
,
choices
=
[
'awq'
,
'gptq'
,
'squeezellm'
,
None
],
choices
=
[
'awq'
,
'gptq'
,
'squeezellm'
,
None
],
default
=
None
,
default
=
None
,
help
=
'Method used to quantize the weights'
)
help
=
'Method used to quantize the weights. If '
'None, we first check the `quantization_config` '
'attribute in the model config file. If that is '
'None, we assume the model weights are not '
'quantized and use `dtype` to determine the data '
'type of the weights.'
)
parser
.
add_argument
(
'--enforce-eager'
,
parser
.
add_argument
(
'--enforce-eager'
,
action
=
'store_true'
,
action
=
'store_true'
,
help
=
'Always use eager-mode PyTorch. If False, '
help
=
'Always use eager-mode PyTorch. If False, '
...
...
vllm/entrypoints/llm.py
View file @
30fb0956
...
@@ -38,9 +38,10 @@ class LLM:
...
@@ -38,9 +38,10 @@ class LLM:
However, if the `torch_dtype` in the config is `float32`, we will
However, if the `torch_dtype` in the config is `float32`, we will
use `float16` instead.
use `float16` instead.
quantization: The method used to quantize the model weights. Currently,
quantization: The method used to quantize the model weights. Currently,
we support "awq", "gptq" and "squeezellm". If None, we assume the
we support "awq", "gptq" and "squeezellm". If None, we first check
model weights are not quantized and use `dtype` to determine the
the `quantization_config` attribute in the model config file. If
data type of the weights.
that is None, we assume the model weights are not quantized and use
`dtype` to determine the data type of the weights.
revision: The specific model version to use. It can be a branch name,
revision: The specific model version to use. It can be a branch name,
a tag name, or a commit id.
a tag name, or a commit id.
tokenizer_revision: The specific tokenizer version to use. It can be a
tokenizer_revision: The specific tokenizer version to use. It can be a
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment