Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
40468b13
Unverified
Commit
40468b13
authored
Jul 24, 2024
by
Allen.Dou
Committed by
GitHub
Jul 24, 2024
Browse files
[Bugfix] Miscalculated latency lead to time_to_first_token_seconds inaccurate. (#6686)
parent
2cf0df33
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
3 additions
and
2 deletions
+3
-2
vllm/engine/llm_engine.py
vllm/engine/llm_engine.py
+2
-1
vllm/spec_decode/spec_decode_worker.py
vllm/spec_decode/spec_decode_worker.py
+1
-1
No files found.
vllm/engine/llm_engine.py
View file @
40468b13
...
...
@@ -949,8 +949,9 @@ class LLMEngine:
model_output
:
Optional
[
List
[
SamplerOutput
]]
=
None
)
->
None
:
"""Forced log when no requests active."""
if
self
.
log_stats
:
stats
=
self
.
_get_stats
(
scheduler_outputs
,
model_output
)
for
logger
in
self
.
stat_loggers
.
values
():
logger
.
log
(
s
elf
.
_get_stats
(
scheduler_outputs
,
model_output
)
)
logger
.
log
(
s
tats
)
def
_get_stats
(
self
,
...
...
vllm/spec_decode/spec_decode_worker.py
View file @
40468b13
...
...
@@ -484,7 +484,7 @@ class SpecDecodeWorker(LoraNotSupportedWorkerBase):
for both speculation cases (num_lookahead_slots>0) and non-speculation
cases (e.g. prefill).
Returns True if
f
there are remaining sequences to process.
Returns True if there are remaining sequences to process.
"""
assert
self
.
rank
!=
self
.
_driver_rank
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment