- 25 Apr, 2025 14 commits
-
-
Michael Yao authored
Signed-off-by:windsonsea <haifeng.yao@daocloud.io>
-
rasmith authored
[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734) Signed-off-by:
Randall Smith <Randall.Smith@amd.com> Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Co-authored-by:
Luka Govedič <lgovedic@redhat.com>
-
Sangyeon Cho authored
Signed-off-by:
csy1204 <josang1204@gmail.com> Co-authored-by:
조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>
-
yexin(叶鑫) authored
[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457) Signed-off-by:
cynthieye <yexin93@qq.com> Co-authored-by:
MagnetoWang <magnetowang@outlook.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkinson@neuralmagic.com>
-
Mengqing Cao authored
Signed-off-by:Mengqing Cao <cmq0113@163.com>
-
Lifu Huang authored
Signed-off-by:Lifu Huang <lifu.hlf@gmail.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Michael Goin authored
-
Varun Sundar Rabindranath authored
Signed-off-by:
varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by:
varun sundar rabindranath <vsundarr@redhat.com>
-
Zaida Zhou authored
Co-authored-by:zhouzaida <zhouzaida@msh.team>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkinson@neuralmagic.com>
-
vllmellm authored
Signed-off-by:
vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by:
tjtanaa <tunjian.tan@embeddedllm.com>
-
jglaser authored
Signed-off-by:Jens Glaser <glaserj@ornl.gov>
-
- 24 Apr, 2025 26 commits
-
-
Michael Goin authored
Signed-off-by:mgoin <mgoin64@gmail.com>
-
Rui Qiao authored
Signed-off-by:Rui Qiao <ruisearch42@gmail.com>
-
Maximilien de Bayser authored
Signed-off-by:Max de Bayser <mbayser@br.ibm.com>
-
Yinghai Lu authored
Signed-off-by:Yinghai Lu <yinghai@thinkingmachines.ai>
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Eyshika Agarwal authored
Signed-off-by:
Eyshika Agarwal <eyshikaengineer@gmail.com> Signed-off-by:
eyshika <eyshikaengineer@gmail.com>
-
Atilla authored
-
Aaruni Aggarwal authored
Signed-off-by:Aaruni Aggarwal <aaruniagg@gmail.com>
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Reid authored
Signed-off-by:
reidliu41 <reid201711@gmail.com> Co-authored-by:
reidliu41 <reid201711@gmail.com>
-
Michael Goin authored
Signed-off-by:mgoin <mgoin64@gmail.com>
-
wang.yuqi authored
-
Shanshan Shen authored
[V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (#16954) Signed-off-by:shen-shanshan <467638484@qq.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <2037008807@qq.com>
-
Rui Qiao authored
Signed-off-by:Rui Qiao <ruisearch42@gmail.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Michael Goin authored
Signed-off-by:mgoin <mgoin64@gmail.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
omer-dayan authored
Signed-off-by:
Omer Dayan (SW-GPU) <omer@run.ai> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
Reid authored
Signed-off-by:
reidliu41 <reid201711@gmail.com> Co-authored-by:
reidliu41 <reid201711@gmail.com>
-
Reid authored
Signed-off-by:
reidliu41 <reid201711@gmail.com> Co-authored-by:
reidliu41 <reid201711@gmail.com>
-