Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357)
Signed-off-by:Satyajith Chilappagari <satchill@amazon.com> Co-authored-by:
Aaron Dou <yzdou@amazon.com> Co-authored-by:
Shashwat Srijan <sssrijan@amazon.com> Co-authored-by:
Chongming Ni <chongmni@amazon.com> Co-authored-by:
Amulya Ballakur <amulyaab@amazon.com> Co-authored-by:
Patrick Lange <patlange@amazon.com> Co-authored-by:
Elaine Zhao <elaineyz@amazon.com> Co-authored-by:
Lin Lin Pan <tailinpa@amazon.com> Co-authored-by:
Navyadhara Gogineni <navyadha@amazon.com> Co-authored-by:
Yishan McNabb <yishanm@amazon.com> Co-authored-by:
Mrinal Shukla <181322398+mrinalks@users.noreply.github.com>
Showing
This diff is collapsed.
Please register or sign in to comment