- 13 Jun, 2024 1 commit
-
-
youkaichao authored
[Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293)
-
- 12 Jun, 2024 5 commits
-
-
Travis Johnson authored
Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by:
Sanger Steel <sangersteel@gmail.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
Cody Yu authored
Inspired by #5146, this PR improves FP8 quantize kernel by vectorizing data transfer to better utilize memory bandwidth. Microbenchmark shows that this improved kernel can achieve 1.0x-1.5x speedup (especially when hidden size is large). In details, we applied 3 optimizations: - Use inverted scale so that most divisions are changed to multiplications. - Unroll the loop by 4 times to improve ILP. - Use vectorized 4 to transfer data between HBM and SRAM.
-
SangBin Cho authored
-
Simon Mo authored
Revert "[CI/Build] Add `is_quant_method_supported` to control quantization test configurations" (#5463)
-
Michael Goin authored
-
- 11 Jun, 2024 5 commits
-
-
Nick Hill authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
youkaichao authored
-
sasha0552 authored
-
Cyrus Leung authored
-
maor-ps authored
Co-authored-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 10 Jun, 2024 4 commits
-
-
Itay Etelis authored
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
Cyrus Leung authored
-
Dipika Sikka authored
Co-authored-by:Michael Goin <michael@neuralmagic.com>
-
- 09 Jun, 2024 2 commits
-
-
bnellnm authored
-
youkaichao authored
[mis][ci/test] fix flaky test in tests/test_sharded_state_loader.py (#5361)
-
- 08 Jun, 2024 2 commits
-
-
youkaichao authored
[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner) (#5357)
-
youkaichao authored
[CI/Test] improve robustness of test by replacing del with context manager (hf_runner) (#5347)
-
- 07 Jun, 2024 5 commits
-
-
Roger Wang authored
Co-authored-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Dipika Sikka authored
Co-authored-by:
Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by:
Varun Sundar Rabindranath <varun@neuralmagic.com>
-
youkaichao authored
-
Itay Etelis authored
-
Antoni Baum authored
-
- 06 Jun, 2024 3 commits
-
-
Matthew Goldey authored
-
liuyhwangyh authored
Co-authored-by:mulin.lyh <mulin.lyh@taobao.com>
-
Cyrus Leung authored
-
- 05 Jun, 2024 4 commits
-
-
Breno Faria authored
Co-authored-by:
Simon Mo <simon.mo@hey.com> Co-authored-by:
Breno Faria <breno.faria@intrafind.com>
-
Nick Hill authored
-
Woosuk Kwon authored
-
zifeitong authored
-
- 04 Jun, 2024 4 commits
-
-
Cyrus Leung authored
-
Cyrus Leung authored
-
afeldman-nm authored
[Bugfix]: During testing, use pytest monkeypatch for safely overriding the env var that indicates the vLLM backend (#5210)
-
Toshiki Kataoka authored
-
- 03 Jun, 2024 5 commits
-
-
Breno Faria authored
-
Kaiyang Chen authored
-
Yuan authored
-
Tyler Michael Smith authored
-
Cyrus Leung authored
-