"examples/offline_inference/pooling/embed_matryoshka_fy.py" did not exist on "6ae996a8733269a10cbbc25b8b45b921d81eb362"
- 25 Mar, 2026 1 commit
-
-
mikaylagawarecki authored
Signed-off-by:Mikayla Gawarecki <mikaylagawarecki@gmail.com>
-
- 09 Jan, 2026 1 commit
-
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
- 08 Oct, 2025 1 commit
-
-
Wentao Ye authored
Signed-off-by:
nicole-lihui <nicole.li@daocloud.io> Signed-off-by:
yewentao256 <zhyanwentao@126.com> Signed-off-by:
courage17340 <courage17340@163.com> Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by:
Jacob Kahn <jacobkahn1@gmail.com> Signed-off-by:
Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by:
Fadi Arafeh <fadi.arafeh@arm.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Signed-off-by:
Agata Dobrzyniewicz <adobrzyniewicz@habana.ai> Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by:
zxw <1020938856@qq.com> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by:
wang.yuqi <noooop@126.com> Signed-off-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by:
Kunshang Ji <kunshang.ji@intel.com> Signed-off-by:
chenlang <chen.lang5@zte.com.cn> Signed-off-by:
youkaichao <youkaichao@gmail.com> Signed-off-by:
Jonas Kuebler <kuebj@amazon.com> Signed-off-by:
jiang1.li <jiang1.li@intel.com> Signed-off-by:
Russell Bryant <rbryant@redhat.com> Signed-off-by:
NickLucche <nlucches@redhat.com> Signed-off-by:
Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by:
AlonKejzman <alonkeizman@gmail.com> Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
taohui <taohui3@gmail.com> Signed-off-by:
Tao Hui <taohui3@gmail.com> Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Signed-off-by:
Matthew Bonanni <mbonanni001@gmail.com> Signed-off-by:
Jee Jee Li <pandaleefree@gmail.com> Signed-off-by:
Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Signed-off-by:
Zhuohan Li <zhuohan123@gmail.com> Signed-off-by:
Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by:
Shu Wang. <shuw@nvidia.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Signed-off-by:
Aleksandr Malyshev <maleksan@amd.com> Signed-off-by:
Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by:
Eugene Khvedchenya <ekhvedchenya@gmail.com> Signed-off-by:
yiting.jiang <yiting.jiang@daocloud.io> Signed-off-by:
Andrew Sansom <andrew@protopia.ai> Signed-off-by:
xaguilar <Xavier.AguilarFruto@amd.com> Signed-off-by:
Iceber Gu <caiwei95@hotmail.com> Signed-off-by:
Tao He <linzhu.ht@alibaba-inc.com> Signed-off-by:
Icey <1790571317@qq.com> Signed-off-by:
Sage Moore <sage@neuralmagic.com> Signed-off-by:
许文卿 <xwq391974@alibaba-inc.com> Signed-off-by:
Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Signed-off-by:
chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by:
Seiji Eicher <seiji@anyscale.com> Signed-off-by:
Seiji Eicher <58963096+eicherseiji@users.noreply.github.com> Signed-off-by:
zjy0516 <riverclouds.zhu@qq.com> Signed-off-by:
Kosseila (CloudThrill) <klouddude@gmail.com> Signed-off-by:
frankwang28 <frank.wbb@hotmail.com> Signed-off-by:
Frank Wang <41319051+frankwang28@users.noreply.github.com> Signed-off-by:
mgoin <mgoin64@gmail.com> Signed-off-by:
fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by:
zixi-qi <qizixi@meta.com> Signed-off-by:
Bram Wasti <bwasti@meta.com> Signed-off-by:
Naman Lalit <nl2688@nyu.edu> Signed-off-by:
Chenheli Hua <huachenheli@outlook.com> Signed-off-by:
Junhong <liujunhong11@huawei.com> Signed-off-by:
Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by:
22quinn <33176974+22quinn@users.noreply.github.com> Signed-off-by:
rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by:
Peter Pan <Peter.Pan@daocloud.io> Signed-off-by:
Patrick Toulme <ptoulme@meta.com> Signed-off-by:
Patrick Toulme <pctoulme+1@gmail.com> Signed-off-by:
Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by:
Clayton Coleman <smarterclayton@gmail.com> Signed-off-by:
Jialin Ouyang <jialino@meta.com> Signed-off-by:
Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by:
Weiliang Liu <weiliangl@nvidia.com> Signed-off-by:
zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by:
liuye.hj <liuye.hj@alibaba-inc.com> Signed-off-by:
Juechen Liu <jueliu@meta.com> Signed-off-by:
simon-mo <simon.mo@hey.com> Signed-off-by:
Robert Shaw <robshaw@redhat.com> Signed-off-by:
Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by:
isotr0py <2037008807@qq.com> Signed-off-by:
yingjun-mou <renzomou@gmail.com> Signed-off-by:
zhoukz <me@zhoukz.com> Signed-off-by:
Chenxi Yang <cxyang@fb.com> Signed-off-by:
Rahul Tuli <rtuli@redhat.com> Signed-off-by:
Lee Nau <lnau@nvidia.com> Signed-off-by:
adabeyta <aabeyta@redhat.com> Signed-off-by:
Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by:
simondanielsson <simon.danielsson99@hotmail.com> Signed-off-by:
Chen Zhang <zhangch99@outlook.com> Signed-off-by:
Yongye Zhu <zyy1102000@gmail.com> Signed-off-by:
Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by:
Lucia Fang <fanglu@meta.com> Signed-off-by:
a120092009 <zhaoty0121@gmail.com> Signed-off-by:
sergiopaniego <sergiopaniegoblanco@gmail.com> Signed-off-by:
Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Signed-off-by:
wangyafeng <wangyafeng@baidu.com> Signed-off-by:
Lehua Ding <lehuading@tencent.com> Signed-off-by:
lyd1992 <liuyudong@iscas.ac.cn> Signed-off-by:
ihb2032 <1355790728@qq.com> Signed-off-by:
asafg <39553475+Josephasafg@users.noreply.github.com> Signed-off-by:
anion <1005128408@qq.com> Signed-off-by:
Anion <123177548+Anionex@users.noreply.github.com> Signed-off-by:
Pavani Majety <pmajety@nvidia.com> Signed-off-by:
Bill Nell <bnell@redhat.com> Signed-off-by:
bnellnm <49004751+bnellnm@users.noreply.github.com> Signed-off-by:
Or Ozeri <oro@il.ibm.com> Signed-off-by:
cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by:
David Ben-David <davidb@pliops.com> Signed-off-by:
Andrew Xia <axia@meta.com> Signed-off-by:
Andrew Xia <axia@fb.com> Signed-off-by:
Lu Fang <fanglu@fb.com> Signed-off-by:
Salvatore Cena <cena@cenas.it> Signed-off-by:
padg9912 <phone.and.desktop@gmail.com> Signed-off-by:
nadathurv <work.vnadathur@gmail.com> Signed-off-by:
WorldExplored <srreyansh.sethi@gmail.com> Signed-off-by:
wwl2755 <wangwenlong2755@gmail.com> Signed-off-by:
billishyahao <bill.he@amd.com> Signed-off-by:
Nathan Scott <nathans@redhat.com> Signed-off-by:
Kenichi Maehashi <maehashi@preferred.jp> Signed-off-by:
Johnny <johnnynuca14@gmail.com> Signed-off-by:
johnnynunez <johnnynuca14@gmail.com> Signed-off-by:
Johnny <johnnync13@gmail.com> Signed-off-by:
Huamin Li <3ericli@gmail.com> Signed-off-by:
Hosang Yoon <hosang.yoon@amd.com> Signed-off-by:
Jerry Zhang <jerryzh168@gmail.com> Signed-off-by:
Peter Schuurman <psch@google.com> Signed-off-by:
Huy Do <huydhn@gmail.com> Signed-off-by:
leo-pony <nengjunma@outlook.com> Signed-off-by:
vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Signed-off-by:
ElizaWszola <ewszola@redhat.com> Signed-off-by:
ElizaWszola <elizaw.9289@gmail.com> Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by:
Michael Goin <mgoin64@gmail.com> Signed-off-by:
Benjamin Chislett <bchislett@nvidia.com> Signed-off-by:
tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by:
zhewenli <zhewenli@meta.com> Signed-off-by:
ahao-anyscale <ahao@anyscale.com> Signed-off-by:
Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by:
huijjj <huijong.jeong@squeezebits.com> Signed-off-by:
Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by:
kyt <eluban4532@gmail.com> Signed-off-by:
Egor <e.a.krivov@gmail.com> Signed-off-by:
Yang <lymailforjob@gmail.com> Signed-off-by:
Paul Pak <paulpak58@gmail.com> Signed-off-by:
whx-sjtu <2952154980@qq.com> Signed-off-by:
Xiang Si <sixiang@google.com> Signed-off-by:
Aleksandr Samarin <astrlrd@nebius.com> Signed-off-by:
Jun Jiang <jasl9187@hotmail.com> Signed-off-by:
Chendi Xue <Chendi.Xue@intel.com> Signed-off-by:
Chendi.Xue <chendi.xue@intel.com> Signed-off-by:
Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: Nicole LiHui
🥜 <nicolelihui@outlook.com> Co-authored-by:courage17340 <courage17340@users.noreply.github.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by:
Jacob Kahn <jacobkahn1@gmail.com> Co-authored-by:
Roger Wang <hey@rogerw.io> Co-authored-by: Nicole LiHui
🥜 <nicole.li@daocloud.io> Co-authored-by:Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by:
Fadi Arafeh <115173828+fadara01@users.noreply.github.com> Co-authored-by:
Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by:
yyzxw <34639446+yyzxw@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
wang.yuqi <noooop@126.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Kunshang Ji <kunshang.ji@intel.com> Co-authored-by:
chenlang <chen.lang5@zte.com.cn> Co-authored-by:
chenlang <10346245@zte.com.cn> Co-authored-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com> Co-authored-by:
Li, Jiang <jiang1.li@intel.com> Co-authored-by:
Russell Bryant <rbryant@redhat.com> Co-authored-by:
Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by:
AlonKejzman <alonkeizman@gmail.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Tao Hui <taohui3@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
Jee Jee Li <pandaleefree@gmail.com> Co-authored-by:
Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com> Co-authored-by:
Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by:
tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by:
Shu Wang <shuw@nvidia.com> Co-authored-by:
Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com> Co-authored-by:
Aleksandr Malyshev <maleksan@amd.com> Co-authored-by:
Doug Lehr <douglehr@amd.com> Co-authored-by:
Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by:
yitingdc <59356937+yitingdc@users.noreply.github.com> Co-authored-by:
Andrew Sansom <andrew@protopia.ai> Co-authored-by:
xaguilar-amd <xavier.aguilarfruto@amd.com> Co-authored-by:
Iceber Gu <caiwei95@hotmail.com> Co-authored-by:
Tao He <linzhu.ht@alibaba-inc.com> Co-authored-by:
Icey <1790571317@qq.com> Co-authored-by:
Sage Moore <sage@neuralmagic.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by:
Xu Wenqing <121550081+Xu-Wenqing@users.noreply.github.com> Co-authored-by:
Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by:
RishiAstra <40644327+RishiAstra@users.noreply.github.com> Co-authored-by:
Chauncey <chaunceyjiang@gmail.com> Co-authored-by:
Seiji Eicher <58963096+eicherseiji@users.noreply.github.com> Co-authored-by:
Rui Qiao <161574667+ruisearch42@users.noreply.github.com> Co-authored-by:
Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
阿丹(adan) <47373076+LDLINGLINGLING@users.noreply.github.com> Co-authored-by:
liudan <adan@minicpm.com> Co-authored-by:
liudan <liudan@qq.com> Co-authored-by:
Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by:
Clouddude <kouss.hd@gmail.com> Co-authored-by:
Frank Wang <41319051+frankwang28@users.noreply.github.com> Co-authored-by:
fhl2000 <63384265+fhl2000@users.noreply.github.com> Co-authored-by:
qizixi <22851944+zixi-qi@users.noreply.github.com> Co-authored-by:
Bram Wasti <bwasti@fb.com> Co-authored-by:
Naman Lalit <nl2688@nyu.edu> Co-authored-by:
Chenheli Hua <huachenheli@outlook.com> Co-authored-by:
WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by:
Junhong <liujunhong11@huawei.com> Co-authored-by:
LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com> Co-authored-by:
22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by:
Xiaohan Zou <renovamenzxh@gmail.com> Co-authored-by:
rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by:
Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by:
Peter Pan <peter.pan@daocloud.io> Co-authored-by:
Patrick C. Toulme <135739773+patrick-toulme@users.noreply.github.com> Co-authored-by:
Clayton Coleman <smarterclayton@gmail.com> Co-authored-by:
Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by:
Jialin Ouyang <jialino@meta.com> Co-authored-by:
weiliang <weiliangl@nvidia.com> Co-authored-by:
Yuxuan Zhang <2448370773@qq.com> Co-authored-by:
JJJYmmm <92386084+JJJYmmm@users.noreply.github.com> Co-authored-by:
liuye.hj <liuye.hj@alibaba-inc.com> Co-authored-by:
Juechen Liu <grinchcoder@gmail.com> Co-authored-by:
Robert Shaw <robshaw@redhat.com> Co-authored-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Yingjun Mou <renzomou@gmail.com> Co-authored-by:
Zhou Jiahao <me@zhoukz.com> Co-authored-by:
Chenxi Yang <cxyang@cs.utexas.edu> Co-authored-by:
Chenxi Yang <cxyang@fb.com> Co-authored-by:
Rahul Tuli <rtuli@redhat.com> Co-authored-by:
Lee Nau <lee.nau@gmail.com> Co-authored-by:
Adrian Abeyta <aabeyta@redhat.com> Co-authored-by:
Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by:
Aaron Pham <contact@aarnphm.xyz> Co-authored-by:
acisseJZhong <40467976+acisseJZhong@users.noreply.github.com> Co-authored-by:
Simon Danielsson <70206058+simondanielsson@users.noreply.github.com> Co-authored-by:
Yongye Zhu <zyy1102000@gmail.com> Co-authored-by:
Chen Zhang <zhangch99@outlook.com> Co-authored-by:
Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by:
Lucia Fang <fanglu@meta.com> Co-authored-by:
Siyuan Fu <siyuanf@nvidia.com> Co-authored-by:
Xiaozhu Meng <mxz297@gmail.com> Co-authored-by:
Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Co-authored-by:
a120092009 <33205509+a120092009@users.noreply.github.com> Co-authored-by:
Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by:
CSWYF3634076 <wangyafeng@baidu.com> Co-authored-by:
Lehua Ding <lehuading@tencent.com> Co-authored-by:
Reza Barazesh <3146276+rzabarazesh@users.noreply.github.com> Co-authored-by:
ihb2032 <40718643+ihb2032@users.noreply.github.com> Co-authored-by:
Asaf Joseph Gardin <39553475+Josephasafg@users.noreply.github.com> Co-authored-by:
Anion <123177548+Anionex@users.noreply.github.com> Co-authored-by:
Pavani Majety <pmajety@nvidia.com> Co-authored-by:
bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by:
Or Ozeri <oro@il.ibm.com> Co-authored-by:
cjackal <44624812+cjackal@users.noreply.github.com> Co-authored-by:
David Ben-David <sdavidbd@gmail.com> Co-authored-by:
David Ben-David <davidb@pliops.com> Co-authored-by:
Andrew Xia <axia@mit.edu> Co-authored-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Salvatore Cena <cena@cenas.it> Co-authored-by:
Param <psch@cs.unc.edu> Co-authored-by:
Zhewen Li <zhewenli@meta.com> Co-authored-by:
nadathurv <work.vnadathur@gmail.com> Co-authored-by:
Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Co-authored-by:
Wenlong Wang <wangwenlong2755@gmail.com> Co-authored-by:
billishyahao <bill.he@amd.com> Co-authored-by:
Nathan Scott <natoscott@users.noreply.github.com> Co-authored-by:
Kenichi Maehashi <939877+kmaehashi@users.noreply.github.com> Co-authored-by:
Johnny <johnnync13@gmail.com> Co-authored-by:
Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Co-authored-by:
Huamin Li <3ericli@gmail.com> Co-authored-by:
rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by:
Hosang <156028780+hyoon1@users.noreply.github.com> Co-authored-by:
Jerry Zhang <jerryzh168@gmail.com> Co-authored-by:
pwschuurman <psch@google.com> Co-authored-by:
Huy Do <huydhn@gmail.com> Co-authored-by:
leo-pony <nengjunma@outlook.com> Co-authored-by:
vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by:
ElizaWszola <ewszola@redhat.com> Co-authored-by:
Luka Govedič <lgovedic@redhat.com> Co-authored-by:
Benjamin Chislett <bchislett@nvidia.com> Co-authored-by:
Andrew Xia <axia@meta.com> Co-authored-by:
Simon Mo <simon.mo@hey.com> Co-authored-by:
TJian <tunjian.tan@embeddedllm.com> Co-authored-by:
ahao-anyscale <ahao@anyscale.com> Co-authored-by:
Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by:
Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by:
Liu-congo <1502632128@qq.com> Co-authored-by:
HUIJONG JEONG <64083281+huijjj@users.noreply.github.com> Co-authored-by:
Yannick Schnider <Yannick.Schnider1@ibm.com> Co-authored-by:
kyt <eluban4532@gmail.com> Co-authored-by:
Egor <e.a.krivov@gmail.com> Co-authored-by:
Yang Liu <127183760+KKSK-DON@users.noreply.github.com> Co-authored-by:
Paul Pak <52512091+paulpak58@users.noreply.github.com> Co-authored-by:
whx <56632993+whx-sjtu@users.noreply.github.com> Co-authored-by:
Xiang Si <sixiang@google.com> Co-authored-by:
Aleksandr Samarin <samarin_ad@mail.ru> Co-authored-by:
Jun Jiang <jasl9187@hotmail.com> Co-authored-by:
Chendi.Xue <chendi.xue@intel.com> Co-authored-by:
Nikhil G <nrghosh@users.noreply.github.com>
-
- 17 Sep, 2025 1 commit
-
-
Aidyn-A authored
Signed-off-by:Aidyn-A <aidyn.b.aitzhan@gmail.com>
-
- 05 Aug, 2025 1 commit
-
-
Wentao Ye authored
Signed-off-by:
yewentao256 <zhyanwentao@126.com> Co-authored-by:
mgoin <mgoin64@gmail.com>
-
- 22 Jul, 2025 1 commit
-
-
Mickaël Seznec authored
Signed-off-by:
Mickael Seznec <mickael@mistral.ai> Co-authored-by:
mgoin <mgoin64@gmail.com>
-
- 03 Jun, 2025 1 commit
-
-
Michael Goin authored
Signed-off-by:mgoin <mgoin64@gmail.com>
-
- 31 Mar, 2025 1 commit
-
-
Charlie Fu authored
Signed-off-by:charlifu <charlifu@amd.com>
-
- 11 Mar, 2025 1 commit
-
-
Jeff Daily authored
Signed-off-by:Jeff Daily <jeff.daily@amd.com>
-
- 08 Nov, 2024 1 commit
-
-
Luka Govedič authored
Signed-off-by:
luka <luka@neuralmagic.com> Co-authored-by:
youkaichao <youkaichao@126.com>
-
- 16 Oct, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 22 Aug, 2024 1 commit
-
-
Luka Govedič authored
Co-authored-by:Michael Goin <michael@neuralmagic.com>
-
- 16 Aug, 2024 1 commit
-
-
Charlie Fu authored
-
- 26 Jul, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 22 Jul, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 20 Jul, 2024 1 commit
-
-
Varun Sundar Rabindranath authored
Co-authored-by:Varun Sundar Rabindranth <varun@neuralmagic.com>
-
- 18 Jul, 2024 1 commit
-
-
Varun Sundar Rabindranath authored
Co-authored-by:Varun Sundar Rabindranath <varun@neuralmagic.com>
-
- 12 Jun, 2024 1 commit
-
-
Cody Yu authored
Inspired by #5146, this PR improves FP8 quantize kernel by vectorizing data transfer to better utilize memory bandwidth. Microbenchmark shows that this improved kernel can achieve 1.0x-1.5x speedup (especially when hidden size is large). In details, we applied 3 optimizations: - Use inverted scale so that most divisions are changed to multiplications. - Unroll the loop by 4 times to improve ILP. - Use vectorized 4 to transfer data between HBM and SRAM.
-
- 09 Jun, 2024 1 commit
-
-
bnellnm authored
-
- 22 May, 2024 1 commit
-
-
Michael Goin authored
-
- 10 May, 2024 1 commit
-
-
Cody Yu authored
-
- 07 May, 2024 1 commit
-
-
Philipp Moritz authored
Previously FP8 static scaling works if the scales are overestimating the maxima of all activation tensors during computation. However this will not always be the case even if the scales were calibrated very carefully. For example, with the activations in my checkpoint https://huggingface.co/pcmoritz/Mixtral-8x7B-v0.1-fp8-act-scale (which was calibrated on https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), I'm getting the following mostly random performance on MMLU: | Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |------------------|-------|------|-----:|------|-----:|---|-----:| |mmlu |N/A |none | 0|acc |0.2295|± |0.0035| | - humanities |N/A |none | 5|acc |0.2421|± |0.0062| | - other |N/A |none | 5|acc |0.2398|± |0.0076| | - social_sciences|N/A |none | 5|acc |0.2171|± |0.0074| | - stem |N/A |none | 5|acc |0.2125|± |0.0073| With the fix in this PR where the scaled activations are clamped between [-std::numeric_limits<c10::Float8_e4m3fn>::max(), std::numeric_limits<c10::Float8_e4m3fn>::max()] to make sure there are no NaNs, the performance is | Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |------------------|-------|------|-----:|------|-----:|---|-----:| |mmlu |N/A |none | 0|acc |0.7008|± |0.0036| | - humanities |N/A |none | 5|acc |0.6453|± |0.0065| | - other |N/A |none | 5|acc |0.7692|± |0.0072| | - social_sciences|N/A |none | 5|acc |0.8083|± |0.0070| | - stem |N/A |none | 5|acc |0.6115|± |0.0083| This is not perfect yet but is getting very close to the FP16 / dynamic activation scale performance.
-
- 27 Apr, 2024 1 commit
-
-
Philipp Moritz authored
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 24 Apr, 2024 1 commit
-
-
Philipp Moritz authored
This PR is the first step towards fixing https://github.com/vllm-project/vllm/pull/3208 It implements dynamic per-tensor scaling (see https://github.com/vllm-project/vllm/pull/4118), so users do not need to compute activation scales on a calibration dataset and they also don't need to convert their model checkpoints. It is enough to specify the `quantization="fp8"` argument. You can try out the PR like this: ```python from vllm import LLM, SamplingParams prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] sampling_params = SamplingParams(temperature=0.8, top_p=0.95) llm = LLM(model="mistralai/Mixtral-8x7B-Instruct-v0.1", tensor_parallel_size=2, quantization="fp8") outputs = llm.generate(prompts, sampling_params) # Print the outputs. for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` **Performance**: For this PR, the focus is on making the code clean (while still trying to get reasonable performance), there is a bunch of optimizations that we will submit as a follow up PR that significantly improve the performance (similar to the numbers in https://github.com/vllm-project/vllm/pull/3954). With this PR, the results are as follows: <img width="725" alt="Screenshot 2024-04-21 at 1 31 50 PM" src="https://github.com/vllm-project/vllm/assets/113316/d8fe1118-07a0-4d4e-8530-37a77d465a03"> **Accuracy**: The accuracy with this PR on MMLU on `mistralai/Mixtral-8x7B-v0.1` is as follows: ``` | Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |------------------|-------|------|-----:|------|-----:|---|-----:| |mmlu |N/A |none | 0|acc |0.7018|± |0.0036| | - humanities |N/A |none | 5|acc |0.6472|± |0.0065| | - other |N/A |none | 5|acc |0.7673|± |0.0072| | - social_sciences|N/A |none | 5|acc |0.8099|± |0.0070| | - stem |N/A |none | 5|acc |0.6131|± |0.0083| ``` this compares favorably with the fp16 results which are ``` | Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |------------------|-------|------|-----:|------|-----:|---|-----:| |mmlu |N/A |none | 0|acc |0.7020|± |0.1313| | - humanities |N/A |none | 5|acc |0.6425|± |0.1349| | - other |N/A |none | 5|acc |0.7744|± |0.1038| | - social_sciences|N/A |none | 5|acc |0.8131|± |0.0695| | - stem |N/A |none | 5|acc |0.6108|± |0.1383| ``` Happy hacking!
-