[V1][Spec Decode] Share input embedding of target model with EAGLE draft model...
[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326) Co-authored-by:root <root@ekagra-8xh100.us-east5-a.c.serving-efficiency-poc.internal> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu>
Showing
Please register or sign in to comment