Tied embeddings in MLP speculator. (#2473)
* Tied embeddings in MLP speculator. * Fixing the scale_weight when users decide to not use the speculation as much as defined in the config. * Adding scaling support + optimize some ops.
Showing
Please register or sign in to comment