- 30 May, 2024 1 commit
-
-
Cyrus Leung authored
-
- 28 May, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 09 May, 2024 1 commit
-
-
Mahmoud Ashraf authored
Co-authored-by:Michael Goin <michael@neuralmagic.com>
-
- 03 May, 2024 1 commit
-
-
SangBin Cho authored
-
- 21 Apr, 2024 1 commit
-
-
GeauxEric authored
Co-authored-by:
Yun Ding <yunding@nvidia.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
- 20 Apr, 2024 2 commits
-
-
nunjunj authored
-
Cody Yu authored
Provide an initial support to FP8 computation. This PR is inspired by HuggingFace TGI: huggingface/text-generation-inference#1726 This feature can be enabled with --quantization fp8 or -q fp8 when launching an engine. Algorithm: We still load a model checkpoint in FP16/BF16. After the weights are loaded, Fp8LinearMethod calculates the per-tensor scaling factor of weights and quantizes the weights accordingly. The scaling factor will then be stored for future use. Meanwhile, the per-tensor scaling factor for activations is calculated in every forward pass. Initial Results: Currently tested Mistral-7B on 1xH100. With prompt length ~5 and decoding length 128: BF16: 1.47s FP8: 1.66s I'll try to use larger models and try to find more performance bottleneck. Meanwhile, you're welcome to try this code.
-
- 12 Apr, 2024 2 commits
-
-
youkaichao authored
-
SangBin Cho authored
-
- 29 Mar, 2024 1 commit
-
-
yhu422 authored
-
- 25 Mar, 2024 2 commits
-
-
xwjiang2010 authored
-
SangBin Cho authored
-
- 22 Mar, 2024 1 commit
-
-
Hanzhi Zhou authored
-
- 06 Mar, 2024 1 commit
-
-
Chujie Zheng authored
-
- 02 Mar, 2024 1 commit
-
-
Sage Moore authored
Co-authored-by:
ElizaWszola <eliza@neuralmagic.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
-
- 04 Feb, 2024 1 commit
-
-
dancingpipi authored
Co-authored-by:shujunhua1 <shujunhua1@jd.com>
-
- 27 Jan, 2024 1 commit
-
-
Hanzhi Zhou authored
-
- 23 Jan, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:
Chen Shen <scv119@gmail.com> Co-authored-by:
Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by:
Avnish Narayan <avnish@anyscale.com>
-
- 18 Jan, 2024 1 commit
-
-
shiyi.c_98 authored
Co-authored-by:
DouHappy <2278958187@qq.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com>
-
- 17 Dec, 2023 2 commits
-
-
Woosuk Kwon authored
-
Woosuk Kwon authored
Co-authored-by:
Chen Shen <scv119@gmail.com> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 15 Dec, 2023 1 commit
-
-
CHU Tianxiang authored
-
- 20 Nov, 2023 1 commit
-
-
Simon Mo authored
-
- 03 Oct, 2023 1 commit
-
-
Federico Cassano authored
Co-authored-by:Zhuohan Li <zhuohan123@gmail.com>
-
- 20 Sep, 2023 1 commit
-
-
Woosuk Kwon authored
-
- 18 Sep, 2023 1 commit
-
-
orellavie1212 authored
-
- 13 Sep, 2023 1 commit
-
-
Jasmond L authored
Co-authored-by:
Jasmond Loh <Jasmond.Loh@hotmail.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com>
-
- 08 Jul, 2023 1 commit
-
-
Woosuk Kwon authored
-
- 07 Jul, 2023 1 commit
-
-
codethazine authored
-
- 03 Jul, 2023 1 commit
-
-
Zhuohan Li authored
-
- 28 Jun, 2023 3 commits
-
-
Woosuk Kwon authored
-
Woosuk Kwon authored
-
Jishnu Ray Chowdhury authored
-
- 22 Jun, 2023 1 commit
-
-
Woosuk Kwon authored
-
- 17 Jun, 2023 2 commits
-
-
Woosuk Kwon authored
-
Zhuohan Li authored
-
- 16 Jun, 2023 1 commit
-
-
Zhuohan Li authored
-
- 07 Jun, 2023 1 commit
-
-
Woosuk Kwon authored
-
- 04 Jun, 2023 1 commit
-
-
Woosuk Kwon authored
-