Handle models with divergent layer sizes
The recent refactoring of the memory prediction assumed all layers are the same size, but for some models (like deepseek-coder-v2) this is not the case, so our predictions were significantly off.
Showing
Please register or sign in to comment