Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
0e39a33c
Unverified
Commit
0e39a33c
authored
Aug 17, 2024
by
Gordon Wong
Committed by
GitHub
Aug 16, 2024
Browse files
[Bugfix][Hardware][AMD][Frontend] add quantization param to embedding checking method (#7513)
parent
6fc5b0f2
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
2 deletions
+5
-2
vllm/entrypoints/openai/api_server.py
vllm/entrypoints/openai/api_server.py
+5
-2
No files found.
vllm/entrypoints/openai/api_server.py
View file @
0e39a33c
...
@@ -60,11 +60,13 @@ logger = init_logger('vllm.entrypoints.openai.api_server')
...
@@ -60,11 +60,13 @@ logger = init_logger('vllm.entrypoints.openai.api_server')
_running_tasks
:
Set
[
asyncio
.
Task
]
=
set
()
_running_tasks
:
Set
[
asyncio
.
Task
]
=
set
()
def
model_is_embedding
(
model_name
:
str
,
trust_remote_code
:
bool
)
->
bool
:
def
model_is_embedding
(
model_name
:
str
,
trust_remote_code
:
bool
,
quantization
:
str
)
->
bool
:
return
ModelConfig
(
model
=
model_name
,
return
ModelConfig
(
model
=
model_name
,
tokenizer
=
model_name
,
tokenizer
=
model_name
,
tokenizer_mode
=
"auto"
,
tokenizer_mode
=
"auto"
,
trust_remote_code
=
trust_remote_code
,
trust_remote_code
=
trust_remote_code
,
quantization
=
quantization
,
seed
=
0
,
seed
=
0
,
dtype
=
"auto"
).
embedding_mode
dtype
=
"auto"
).
embedding_mode
...
@@ -97,7 +99,8 @@ async def build_async_engine_client(args) -> AsyncIterator[AsyncEngineClient]:
...
@@ -97,7 +99,8 @@ async def build_async_engine_client(args) -> AsyncIterator[AsyncEngineClient]:
# If manually triggered or embedding model, use AsyncLLMEngine in process.
# If manually triggered or embedding model, use AsyncLLMEngine in process.
# TODO: support embedding model via RPC.
# TODO: support embedding model via RPC.
if
(
model_is_embedding
(
args
.
model
,
args
.
trust_remote_code
)
if
(
model_is_embedding
(
args
.
model
,
args
.
trust_remote_code
,
args
.
quantization
)
or
args
.
disable_frontend_multiprocessing
):
or
args
.
disable_frontend_multiprocessing
):
async_engine_client
=
AsyncLLMEngine
.
from_engine_args
(
async_engine_client
=
AsyncLLMEngine
.
from_engine_args
(
engine_args
,
usage_context
=
UsageContext
.
OPENAI_API_SERVER
)
engine_args
,
usage_context
=
UsageContext
.
OPENAI_API_SERVER
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment