"vllm/vscode:/vscode.git/clone" did not exist on "caf953b618e8b49126110eaace49b6cfa4209486"
- 16 May, 2024 9 commits
-
-
Simon Mo authored
-
Alexander Matveev authored
Co-authored-by:Robert Shaw <rshaw@neuralmagic.com>
-
Pierre Dulac authored
-
Alex Wu authored
-
Alex Wu authored
-
Jinzhen Lin authored
-
alexm-nm authored
-
Cody Yu authored
Co-authored-by:
Cade Daniel <edacih@gmail.com> Co-authored-by:
Cade Daniel <cade@anyscale.com>
-
Aurick Qiao authored
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 15 May, 2024 7 commits
-
-
Alex Wu authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
Cyrus Leung authored
-
Zhuohan Li authored
-
zifeitong authored
-
Cyrus Leung authored
-
SangBin Cho authored
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681) This PR combines prepare_prompt and prepare_decode into a single API. This PR also coelsce the attn metadata for prefill/decode to a single class and allow to slice them when running attn backend. It also refactors subquery_start_loc which was not refactored in the previous PR
-
SangBin Cho authored
Lora 3 & 4 test seems to have illegal memory access failure after this commit; [2024-05-14 23:51:18,182 E 22 22] logging.cc:101: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered <br class="Apple-interchange-newline"> Exmaple: https://buildkite.com/vllm/ci/builds/7382#018f793d-1527-4e1c-ab59-c3a34ec55241 This reverts commit 1356df53. FILL IN THE PR DESCRIPTION HERE FIX #xxxx (link existing issues this PR will resolve)
-
- 14 May, 2024 6 commits
-
-
Simon Mo authored
-
Nick Hill authored
Co-authored-by:SAHIL SUNEJA <suneja@us.ibm.com>
-
Cyrus Leung authored
This PR fixes the CI failure introduced by #4798. The failure originates from having duplicate target names in reST, and is fixed by changing the ref targets to anonymous ones. For more information, see this discussion. I have also changed the format of the links to be more distinct from each other.
-
Kuntai Du authored
[Core][Hash][Automatic Prefix caching] Accelerating the hashing function by avoiding deep copies (#4696)
-
Zhuohan Li authored
-
Cyrus Leung authored
-
- 13 May, 2024 11 commits
-
-
Zhuohan Li authored
-
Philipp Moritz authored
-
Stephen Krider authored
Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
LiuXiaoxuanPKU <lilyliupku@gmail.com>
-
Cody Yu authored
-
Sanger Steel authored
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0 (#4208)
-
Woosuk Kwon authored
-
SangBin Cho authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
Cyrus Leung authored
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time) Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
-
youkaichao authored
-
Swapnil Parekh authored
-
Robert Shaw authored
-
- 12 May, 2024 1 commit
-
-
Yikang Shen authored
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 10 May, 2024 5 commits
-
-
youkaichao authored
-
Robert Shaw authored
-
heeju-kim2 authored
Co-authored-by:Cade Daniel <edacih@gmail.com>
-
Allen.Dou authored
-
SangBin Cho authored
Storing exception frame is extremely prone to circular refernece because it contains the reference to objects. When tensorizer is not installed, it leaks llm instance because error frame has references to various modules which cause circular reference problem. I also found spec decoding has a circular reference issue, and I solved it using weakref.proxy.
-