- 21 Jun, 2024 1 commit
-
-
rohithkrn authored
-
- 20 Jun, 2024 1 commit
-
-
Michael Goin authored
-
- 19 Jun, 2024 2 commits
-
-
zifeitong authored
-
Michael Goin authored
-
- 18 Jun, 2024 1 commit
-
-
Ronen Schaffer authored
This PR adds basic support for OpenTelemetry distributed tracing. It includes changes to enable tracing functionality and improve monitoring capabilities. I've also added a markdown with print-screens to guide users how to use this feature. You can find it here
-
- 17 Jun, 2024 1 commit
-
-
Kunshang Ji authored
Co-authored-by:
Jiang Li <jiang1.li@intel.com> Co-authored-by:
Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by:
Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 14 Jun, 2024 2 commits
-
-
Sanger Steel authored
-
Cyrus Leung authored
-
- 13 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 12 Jun, 2024 1 commit
-
-
Woosuk Kwon authored
-
- 11 Jun, 2024 4 commits
-
-
sasha0552 authored
-
Ali Panahi authored
-
maor-ps authored
Co-authored-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Nick Hill authored
-
- 05 Jun, 2024 3 commits
-
-
Alex Wu authored
-
Michael Goin authored
-
zifeitong authored
-
- 03 Jun, 2024 2 commits
-
-
Kaiyang Chen authored
-
Cyrus Leung authored
-
- 01 Jun, 2024 1 commit
-
-
chenqianfzh authored
-
- 30 May, 2024 1 commit
-
-
Cyrus Leung authored
-
- 29 May, 2024 1 commit
-
-
Junichi Sato authored
-
- 28 May, 2024 2 commits
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
Michał Moskal authored
Co-authored-by:Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
-
- 27 May, 2024 1 commit
-
-
Zhuohan Li authored
Co-authored-by:
rsnm2 <rshaw@neuralmagic.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
- 22 May, 2024 3 commits
- 21 May, 2024 1 commit
-
-
Kante Yin authored
Signed-off-by:kerthcet <kerthcet@gmail.com>
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 15 May, 2024 2 commits
-
-
zifeitong authored
-
SangBin Cho authored
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681) This PR combines prepare_prompt and prepare_decode into a single API. This PR also coelsce the attn metadata for prefill/decode to a single class and allow to slice them when running attn backend. It also refactors subquery_start_loc which was not refactored in the previous PR
-
- 14 May, 2024 1 commit
-
-
Nick Hill authored
Co-authored-by:SAHIL SUNEJA <suneja@us.ibm.com>
-
- 13 May, 2024 2 commits
-
-
Sanger Steel authored
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0 (#4208)
-
SangBin Cho authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 09 May, 2024 1 commit
-
-
Cyrus Leung authored
-
- 08 May, 2024 1 commit
-
-
Cody Yu authored
Co-authored-by:Cade Daniel <edacih@gmail.com>
-
- 06 May, 2024 1 commit
-
-
Cyrus Leung authored
-