- 21 Aug, 2024 2 commits
-
-
Brian Li authored
-
Cyrus Leung authored
Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
Fei <dfdfcai4@gmail.com>
-
- 16 Aug, 2024 2 commits
-
-
fzyzcjy authored
-
nunjunj authored
Co-authored-by:
nunjunj <ray@g-3ff9f30f2ed650001.c.vllm-405802.internal> Co-authored-by:
nunjunj <ray@g-1df6075697c3f0001.c.vllm-405802.internal> Co-authored-by:
nunjunj <ray@g-c5a2c23abc49e0001.c.vllm-405802.internal> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
DarkLight1337 <tlleungac@connect.ust.hk>
-
- 09 Aug, 2024 1 commit
-
-
Cyrus Leung authored
-
- 06 Aug, 2024 1 commit
-
-
afeldman-nm authored
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942) Co-authored-by:
Andrew Feldman <afeld2012@gmail.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com>
-
- 04 Aug, 2024 1 commit
-
-
Yihuan Bu authored
Co-authored-by:Cyrus Leung <cyrus.tl.leung@gmail.com>
-
- 22 Jul, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 18 Jul, 2024 1 commit
-
-
youkaichao authored
Co-authored-by:Michael Goin <michael@neuralmagic.com>
-
- 09 Jul, 2024 1 commit
-
-
Swapnil Parekh authored
Co-authored-by:
Swapnil Parekh <swapnilp@ibm.com> Co-authored-by:
Joe G <joseph.granados@h2o.ai> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 03 Jul, 2024 1 commit
-
-
xwjiang2010 authored
Signed-off-by:
Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
- 12 Jun, 2024 1 commit
-
-
Michael Goin authored
-
- 06 Jun, 2024 1 commit
-
-
Matthew Goldey authored
-
- 05 Jun, 2024 1 commit
-
-
DriverSong authored
Co-authored-by:qiujiawei9 <qiujiawei9@jd.com>
-
- 03 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 01 Jun, 2024 1 commit
-
-
Robert Shaw authored
Co-authored-by:mgoin <michael@neuralmagic.com>
-
- 30 May, 2024 1 commit
-
-
Cyrus Leung authored
-
- 28 May, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 09 May, 2024 1 commit
-
-
Mahmoud Ashraf authored
Co-authored-by:Michael Goin <michael@neuralmagic.com>
-
- 03 May, 2024 1 commit
-
-
SangBin Cho authored
-
- 21 Apr, 2024 1 commit
-
-
GeauxEric authored
Co-authored-by:
Yun Ding <yunding@nvidia.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
- 20 Apr, 2024 2 commits
-
-
nunjunj authored
-
Cody Yu authored
Provide an initial support to FP8 computation. This PR is inspired by HuggingFace TGI: huggingface/text-generation-inference#1726 This feature can be enabled with --quantization fp8 or -q fp8 when launching an engine. Algorithm: We still load a model checkpoint in FP16/BF16. After the weights are loaded, Fp8LinearMethod calculates the per-tensor scaling factor of weights and quantizes the weights accordingly. The scaling factor will then be stored for future use. Meanwhile, the per-tensor scaling factor for activations is calculated in every forward pass. Initial Results: Currently tested Mistral-7B on 1xH100. With prompt length ~5 and decoding length 128: BF16: 1.47s FP8: 1.66s I'll try to use larger models and try to find more performance bottleneck. Meanwhile, you're welcome to try this code.
-
- 12 Apr, 2024 2 commits
-
-
youkaichao authored
-
SangBin Cho authored
-
- 29 Mar, 2024 1 commit
-
-
yhu422 authored
-
- 25 Mar, 2024 2 commits
-
-
xwjiang2010 authored
-
SangBin Cho authored
-
- 22 Mar, 2024 1 commit
-
-
Hanzhi Zhou authored
-
- 06 Mar, 2024 1 commit
-
-
Chujie Zheng authored
-
- 02 Mar, 2024 1 commit
-
-
Sage Moore authored
Co-authored-by:
ElizaWszola <eliza@neuralmagic.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
-
- 04 Feb, 2024 1 commit
-
-
dancingpipi authored
Co-authored-by:shujunhua1 <shujunhua1@jd.com>
-
- 27 Jan, 2024 1 commit
-
-
Hanzhi Zhou authored
-
- 23 Jan, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:
Chen Shen <scv119@gmail.com> Co-authored-by:
Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by:
Avnish Narayan <avnish@anyscale.com>
-
- 18 Jan, 2024 1 commit
-
-
shiyi.c_98 authored
Co-authored-by:
DouHappy <2278958187@qq.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com>
-
- 17 Dec, 2023 2 commits
-
-
Woosuk Kwon authored
-
Woosuk Kwon authored
Co-authored-by:
Chen Shen <scv119@gmail.com> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 15 Dec, 2023 1 commit
-
-
CHU Tianxiang authored
-
- 20 Nov, 2023 1 commit
-
-
Simon Mo authored
-