Commits · bb78fb318e69f2e5e42ad2f6cf7dd050330c8643 · OpenDAS / vllm_cscc

22 Feb, 2025 1 commit
- [v1] Support allowed_token_ids in v1 Sampler (#13210) · bb78fb31
  Lu Fang authored Feb 21, 2025
```
Signed-off-by: Lu Fang <lufang@fb.com>
```
  bb78fb31
19 Feb, 2025 1 commit
- [V1][Core] Generic mechanism for handling engine utility (#13060) · caf7ff44
  Nick Hill authored Feb 19, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  caf7ff44
18 Feb, 2025 2 commits
- [V1] Optimize handling of sampling metadata and req_ids list (#13244) · 30172b49
  Nick Hill authored Feb 18, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  30172b49
- [V1][Tests] Adding additional testing for multimodal models to V1 (#13308) · a4d577b3
  Murali Andoorveedu authored Feb 18, 2025
```
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
```
  a4d577b3
17 Feb, 2025 2 commits
- [V1][Spec decode] Move drafter to model runner (#13363) · cd4a72a2
  Woosuk Kwon authored Feb 17, 2025
```
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  cd4a72a2
- [V1] Get input tokens from scheduler (#13339) · 4c21ce9e
  Woosuk Kwon authored Feb 17, 2025
```
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  4c21ce9e
16 Feb, 2025 1 commit
- [V1][Spec Decode] Ngram Spec Decode (#12193) · 80f63a39
  Lily Liu authored Feb 15, 2025
```
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
```
  80f63a39
15 Feb, 2025 2 commits
- [V1][PP] Run engine busy loop with batch queue (#13064) · 9206b3d7
  Cody Yu authored Feb 15, 2025
  
  9206b3d7
- [V1][CI] Fix failed v1-test because of min_p (#13316) · e7eea5a5
  Woosuk Kwon authored Feb 14, 2025
```
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  e7eea5a5
14 Feb, 2025 4 commits
- [V1][Core] min_p sampling support (#13191) · a12934d3
  Aoyu authored Feb 15, 2025
```
Signed-off-by: Aoyu <aoyuzhan@amazon.com>
Co-authored-by: Aoyu <aoyuzhan@amazon.com>
```
  a12934d3
- Support logit_bias in v1 Sampler (#13079) · 6224a9f6
  Lu Fang authored Feb 14, 2025
  
  6224a9f6
- [Bugfix][V1] GPUModelRunner._update_states should return True when there is a... · b0ccfc56
  Kero Liang authored Feb 14, 2025
```
[Bugfix][V1] GPUModelRunner._update_states should return True when there is a finished request in batch (#13126)
```
  b0ccfc56
- Consolidate Llama model usage in tests (#13094) · f2b20fe4
  Harry Mellor authored Feb 14, 2025
  
  f2b20fe4
11 Feb, 2025 2 commits
- [V1][Metrics] Add several request timing histograms (#12644) · 75e6e145
  Mark McLoughlin authored Feb 11, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  75e6e145
- [V1][Metrics] Add GPU prefix cache hit rate % gauge (#12592) · 41c5dd45
  Cody Yu authored Feb 11, 2025
  
  41c5dd45
08 Feb, 2025 1 commit
- [V1] Move KV block hashes from Request to KVCacheManager (#12922) · 32431583
  Woosuk Kwon authored Feb 07, 2025
```
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  32431583
07 Feb, 2025 1 commit

[V1] Logprobs and prompt logprobs support (#9880) · 0630d453

afeldman-nm authored Feb 07, 2025



This PR is adding support for sample logprobs & prompt logprobs to vLLM v1.

New behavior:

- During model execution, model runner computes sample logprobs (if user-provided logprobs setting is not None) and prompt logprobs (if user-provided prompt_logprobs setting is not None). For both sample and prompt logprobs, the engine core returns 3 vectors: token ids, token logprob values, token ranks. Ranks reflect tokens' 1-indexed positions in the vocabulary vector after sorting the vocabulary by log probability in descending order.
- In scheduler.update_from_output(), sample and prompt logprobs are incorporated into the EngineCoreOutput data structure which is transferred to the engine client. If multiprocessing is enabled, then sample and prompt logprobs will be (de)serialized when the EngineCoreOutput data structure is (de)serialized.
- During output processing, the LogprobsProcessor transforms the triplet of token ids, token logprobs values, and token ranks into the OpenAI-compatible List[Dict[token id,Logprob]] format (for sample and prompt logprobs respectively.)
- Each Logprob instance (whether sample- or prompt-) consists of a token's log-probability, rank, and detokenized string representation. Note that logprob detokenization is handled by the LogprobsProcessor not the detokenizer.
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

0630d453

06 Feb, 2025 1 commit

[V1] LoRA Support (#10957) · 467a96a5

Varun Sundar Rabindranath authored Feb 06, 2025


Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

467a96a5

04 Feb, 2025 1 commit
- [V1] Remove scheduling constraint on partial requests (#12674) · 18a88fcc
  Woosuk Kwon authored Feb 04, 2025
```
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  18a88fcc
03 Feb, 2025 1 commit
- [V1] Revert `uncache_blocks` and support recaching full blocks (#12415) · 5095e966
  Cody Yu authored Feb 03, 2025
```
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
```
  5095e966
02 Feb, 2025 2 commits

[Misc] Add SPDX-License-Identifier headers to python source files (#12628) · e489ad7a

Russell Bryant authored Feb 02, 2025

- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**

commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:18:24 2025 -0500

    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/

Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:36:32 2025 -0500

    Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------
Signed-off-by: Russell Bryant <rbryant@redhat.com>

e489ad7a

[Core][v1] Unify allocating slots in prefill and decode in KV cache manager (#12608) · f8ece6e1

Shawn Du authored Feb 02, 2025

As mentioned in RFC https://github.com/vllm-project/vllm/issues/12254

,
this PR achieves the task: combine allocate_slots and append_slots.

There should be no functionality change, except that in decode, also
raise exception when num_tokens is zero (like prefill), and change the
unit test case accordingly.

@comaniac @rickyyx @WoosukKwon @youkaichao @heheda12345 @simon-mo

---------
Signed-off-by: Shawn Du <shawnd200@outlook.com>

f8ece6e1

31 Jan, 2025 1 commit

[v1][Bugfix] Add extra_keys to block_hash for prefix caching (#12603) · 89003c40

Chen Zhang authored Feb 01, 2025



This pr adds extra key to block hash, to generate different hash value
for two blocks with the same token string but different extra_keys in
their parent blocks. For example, it can generate different hash value
for the second block of the following two requests:
```python
request1 = make_request(
        request_id=0,
        prompt_token_ids=[_ for _ in range(6)],
        mm_positions=[{
            "offset": 0,
            "length": 3
        }, {
            "offset": 3,
            "length": 3
        }],
        mm_hashes=["hash1", "hash2"],
    )
    request2 = make_request(
        request_id=1,
        prompt_token_ids=[_ for _ in range(6)],
        mm_positions=[{
            "offset": 0,
            "length": 3
        }, {
            "offset": 3,
            "length": 3
        }],
        mm_hashes=["hash3", "hash2"],
    )
```

---------
Signed-off-by: Chen Zhang <zhangch99@outlook.com>

89003c40

27 Jan, 2025 1 commit
- [V1][CI/Test] Do basic test for top-p & top-k sampling (#12469) · 3f1fc742
  Woosuk Kwon authored Jan 27, 2025
```
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  3f1fc742
24 Jan, 2025 1 commit
- [V1][Frontend] Coalesce bunched `RequestOutput`s (#12298) · 24b0205f
  Nick Hill authored Jan 23, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
```
  24b0205f
23 Jan, 2025 1 commit
- [V1] Add `uncache_blocks` (#12333) · f0ef3723
  Cody Yu authored Jan 22, 2025
  
  f0ef3723
22 Jan, 2025 1 commit
- [Core] Support `reset_prefix_cache` (#12284) · 7206ce4c
  Cody Yu authored Jan 22, 2025
  
  7206ce4c
21 Jan, 2025 1 commit
- [v1][stats][1/n] Add RequestStatsUpdate and RequestStats types (#10907) · 132a1321
  Ricky Xu authored Jan 21, 2025
```
Signed-off-by: rickyx <rickyx@anyscale.com>
```
  132a1321
17 Jan, 2025 1 commit
- [V1] Move more control of kv cache initialization from model_executor to EngineCore (#11960) · 69d765f5
  Chen Zhang authored Jan 17, 2025
```
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
```
  69d765f5
15 Jan, 2025 1 commit
- [V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (#12003) · 994fc655
  Chen Zhang authored Jan 15, 2025
  
  994fc655
13 Jan, 2025 1 commit
- [V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction (#11973) · 619ae268
  Robert Shaw authored Jan 12, 2025
```
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
```
  619ae268
12 Jan, 2025 1 commit
- [V1][Core][1/n] Logging and Metrics (#11962) · 9597a095
  Robert Shaw authored Jan 12, 2025
```
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
```
  9597a095
10 Jan, 2025 1 commit
- [torch.compile] Hide KV cache behind torch.compile boundary (#11677) · cf5f000d
  Chen Zhang authored Jan 10, 2025
```
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
```
  cf5f000d
06 Jan, 2025 2 commits

[V1] Extend beyond image modality and support mixed-modality inference with... · 91b361ae

Roger Wang authored Jan 06, 2025


[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (#11685)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>

91b361ae

[V1] Refactor get_executor_cls (#11754) · 022c5c69
Rui Qiao authored Jan 05, 2025

022c5c69

04 Jan, 2025 2 commits
- [Bugfix][V1] Fix test_kv_cache_utils.py (#11738) · 47831430
  Jee Jee Li authored Jan 05, 2025
```
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
```
  47831430
- [V1] Add kv cache utils tests. (#11513) · d91457d5
  xcnick authored Jan 04, 2025
```
Signed-off-by: xcnick <xcnick0412@gmail.com>
```
  d91457d5
03 Jan, 2025 1 commit
- [V1] Simplify Shutdown (#11659) · 80c751e7
  Robert Shaw authored Jan 03, 2025
  
  80c751e7
01 Jan, 2025 1 commit
- [V1] Implement Cascade Attention (#11635) · 73001445
  Woosuk Kwon authored Jan 01, 2025
```
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  73001445
31 Dec, 2024 1 commit
- [V1] Simpify vision block hash for prefix caching by removing offset from hash (#11646) · 8c3230d8
  Chen Zhang authored Dec 31, 2024
  
  8c3230d8