- 02 Feb, 2025 1 commit
-
-
Russell Bryant authored
- **Add SPDX license headers to python source files** - **Check for SPDX headers using pre-commit** commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by:Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by:
Russell Bryant <rbryant@redhat.com> --------- Signed-off-by:
Russell Bryant <rbryant@redhat.com>
-
- 01 Feb, 2025 1 commit
-
-
Tyler Michael Smith authored
Fixes `is_marlin` not being passed into `get_default_config` Also allow `--tensor-parallel-size` in addition to `-tp` and `--tp-size` Signed-off-by:Tyler Michael Smith <tyler@neuralmagic.com>
-
- 30 Jan, 2025 1 commit
-
-
Divakar Verma authored
Signed-off-by:Divakar Verma <divakar.verma@amd.com>
-
- 23 Jan, 2025 1 commit
-
-
Gregory Shtrasberg authored
Signed-off-by:
Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by:
Micah Williamson <micah.williamson@amd.com>
-
- 21 Jan, 2025 1 commit
-
-
Divakar Verma authored
Signed-off-by:Divakar Verma <divakar.verma@amd.com>
-
- 17 Jan, 2025 1 commit
-
-
Divakar Verma authored
Signed-off-by:Divakar Verma <divakar.verma@amd.com>
-
- 16 Jan, 2025 1 commit
-
-
Varun Sundar Rabindranath authored
-
- 17 Dec, 2024 1 commit
-
-
Roger Wang authored
Signed-off-by:
Roger Wang <ywang@roblox.com> Co-authored-by:
Xiaoyu Zhang <BBuf@users.noreply.github.com>
-
- 19 Nov, 2024 1 commit
-
-
ElizaWszola authored
Signed-off-by:ElizaWszola <eliza@neuralmagic.com>
-
- 18 Nov, 2024 1 commit
-
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkinson@neuralmagic.com>
-
- 06 Nov, 2024 1 commit
-
-
Aaron Pham authored
Signed-off-by:Aaron Pham <contact@aarnphm.xyz>
-
- 29 Oct, 2024 1 commit
-
-
wangshuai09 authored
Signed-off-by:wangshuai09 <391746016@qq.com>
-
- 28 Oct, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 16 Oct, 2024 1 commit
-
-
Cyrus Leung authored
-
- 23 Sep, 2024 1 commit
-
-
Lucas Wilkinson authored
Co-authored-by:
mgoin <michael@neuralmagic.com> Co-authored-by:
Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com>
-
- 18 Sep, 2024 2 commits
-
-
Aaron Pham authored
Signed-off-by:
Aaron Pham <contact@aarnphm.xyz> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
Cyrus Leung authored
-
- 22 Aug, 2024 1 commit
-
-
Luka Govedič authored
Co-authored-by:Michael Goin <michael@neuralmagic.com>
-
- 20 Aug, 2024 1 commit
-
-
Lucas Wilkinson authored
-
- 16 Aug, 2024 1 commit
-
-
Mor Zusman authored
-
- 02 Aug, 2024 1 commit
-
-
Lucas Wilkinson authored
-
- 27 Jul, 2024 2 commits
-
-
Alexander Matveev authored
-
Joe authored
-
- 16 Jul, 2024 1 commit
-
-
Michael Goin authored
-
- 11 Jul, 2024 1 commit
-
-
Robert Shaw authored
Co-authored-by:Robert Shaw <rshaw@neuralmagic.com>
-
- 20 Jun, 2024 1 commit
-
-
Michael Goin authored
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 14 Jun, 2024 1 commit
-
-
Allen.Dou authored
-
- 05 Jun, 2024 1 commit
-
-
Philipp Moritz authored
-
- 04 Jun, 2024 2 commits
-
-
Woosuk Kwon authored
-
Woosuk Kwon authored
-
- 31 May, 2024 2 commits
-
-
Cody Yu authored
-
SnowDist authored
Co-authored-by:Zhuohan Li <zhuohan123@gmail.com>
-
- 23 May, 2024 1 commit
-
-
Alexander Matveev authored
-
- 22 May, 2024 1 commit
-
-
Cody Yu authored
The 2nd PR for #4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
-
- 16 May, 2024 1 commit
-
-
alexm-nm authored
-
- 03 May, 2024 1 commit
-
-
SangBin Cho authored
-
- 01 May, 2024 1 commit
-
-
Philipp Moritz authored
This PR updates the tuning script for the fused_moe kernel to support FP8 and also adds configurations for TP4. Note that for the configuration I removed num_warps and num_stages for small batch sizes since that improved performance and brought the benchmarks on par with the numbers before in that regime to make sure this is a strict improvement over the status quo. All the numbers below are for mistralai/Mixtral-8x7B-Instruct-v0.1, 1000 input and 50 output tokens. Before this PR (with static activation scaling): qps = 1: 9.8 ms ITL, 0.49s e2e latency qps = 2: 9.7 ms ITL, 0.49s e2e latency qps = 4: 10.1 ms ITL, 0.52s e2e latency qps = 6: 11.9 ms ITL, 0.59s e2e latency qps = 8: 14.0 ms ITL, 0.70s e2e latency qps = 10: 15.7 ms ITL, 0.79s e2e latency After this PR (with static activation scaling): qps = 1: 9.8 ms ITL, 0.49s e2e latency qps = 2: 9.7 ms ITL, 0.49s e2e latency qps = 4: 10.2 ms ITL, 0.53s e2e latency qps = 6: 11.9 ms ITL, 0.59s e2e latency qps = 8: 11.9 ms ITL, 0.59s e2e latency qps = 10: 12.1 ms ITL, 0.61s e2e latency
-
- 25 Apr, 2024 1 commit
-
-
Kunshang Ji authored
-
- 23 Apr, 2024 1 commit
-
-
James Fleming authored
Co-authored-by:mgoin <michael@neuralmagic.com>
-