- 04 May, 2024 1 commit
-
-
Michael Goin authored
[Kernel] Support MoE Fp8 Checkpoints for Mixtral (Static Weights with Dynamic/Static Activations) (#4527) Follow on to #4332 to enable FP8 checkpoint loading for Mixtral and supersedes #4436. This PR enables the following checkpoint loading features for Mixtral: Supports loading fp8 checkpoints for Mixtral, such as this "nm-testing/Mixtral-8x7B-Instruct-v0.1-FP8" test model Supports static or dynamic activation quantization with static weight quantization (all per tensor) Supports different scales for each expert weight Supports Fp8 in QKV layer Notes: The Expert Gate/Router always runs at half / full precision for now. If there are different weight scales between QKV layer (for separate QKV weights), they are re-quantized using layer.weight_scale.max() so we can have a single gemm for performance.
-
- 03 May, 2024 2 commits
-
-
Lily Liu authored
Co-authored-by:LiuXiaoxuanPKU <llilyliupku@gmail.com>
-
SangBin Cho authored
-
- 02 May, 2024 1 commit
-
-
Michał Moskal authored
Co-authored-by:SangBin Cho <rkooo567@gmail.com>
-
- 18 Apr, 2024 1 commit
-
-
Michał Moskal authored
-
- 11 Apr, 2024 2 commits
-
-
Antoni Baum authored
-
Kunshang Ji authored
-
- 03 Apr, 2024 1 commit
-
-
Adrian Abeyta authored
Co-authored-by:
Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by:
HaiShaw <hixiao@gmail.com> Co-authored-by:
AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by:
Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by:
root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by:
mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by:
ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by:
guofangze <guofangze@kuaishou.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 30 Mar, 2024 1 commit
-
-
mawong-amd authored
-
- 27 Mar, 2024 1 commit
-
-
Roger Wang authored
-
- 25 Mar, 2024 2 commits
-
-
SangBin Cho authored
-
Woosuk Kwon authored
-
- 24 Mar, 2024 2 commits
-
-
youkaichao authored
-
Nick Hill authored
-
- 20 Mar, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:Roger Wang <136131678+ywang96@users.noreply.github.com>
-
- 16 Mar, 2024 2 commits
- 13 Mar, 2024 2 commits
-
-
Terry authored
-
Woosuk Kwon authored
-
- 11 Mar, 2024 1 commit
-
-
Zhuohan Li authored
-
- 07 Mar, 2024 1 commit
-
-
Woosuk Kwon authored
-
- 27 Feb, 2024 1 commit
-
-
Tao He authored
Signed-off-by:Tao He <sighingnow@gmail.com>
-
- 22 Feb, 2024 1 commit
-
-
Woosuk Kwon authored
-
- 06 Feb, 2024 2 commits
-
-
Lily Liu authored
-
Woosuk Kwon authored
-
- 05 Feb, 2024 1 commit
-
-
Hongxia Yang authored
-
- 01 Feb, 2024 1 commit
-
-
Kunshang Ji authored
Co-authored-by:
Jiang Li <jiang1.li@intel.com> Co-authored-by:
Kunshang Ji <kunshang.ji@intel.com>
-
- 31 Jan, 2024 2 commits
-
-
Philipp Moritz authored
-
Philipp Moritz authored
-
- 30 Jan, 2024 2 commits
-
-
Vladimir authored
-
wangding zeng authored
Co-authored-by:roy <jasonailu87@gmail.com>
-
- 29 Jan, 2024 1 commit
-
-
zhaoyang-star authored
Co-authored-by:
zhaoyang <zhao.yang16@zte.com.cn> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com>
-
- 22 Jan, 2024 1 commit
-
-
Jason Zhu authored
Add a 1-line docstring to explain why calling context_attention_fwd twice in test_prefix_prefill.py (#2553)
-
- 18 Jan, 2024 1 commit
-
-
shiyi.c_98 authored
Co-authored-by:
DouHappy <2278958187@qq.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com>
-
- 14 Jan, 2024 1 commit
-
-
Simon Mo authored
-
- 04 Jan, 2024 1 commit
-
-
Woosuk Kwon authored
-
- 03 Jan, 2024 2 commits
-
-
Zhuohan Li authored
-
Jee Li authored
-
- 10 Dec, 2023 1 commit
-
-
wbn authored
Co-authored-by:
wangguoya <wangguoya@baidu.com> Co-authored-by:
Yang Zhao <zhaoyangstar@foxmail.com>
-
- 03 Dec, 2023 1 commit
-
-
Woosuk Kwon authored
-