• Jesse Gross's avatar
    llm: Use first layer as memory buffer in estimation · 3fe74fba
    Jesse Gross authored
    This is a partial revert of 0478d440 "Fixed over vram allcation dure to
    small initial layer sizes."
    
    Previously we used the size of the first layer as an extra reserved
    amount of space to buffer our memory estimates. The above commit
    changed this to use the largest layer. However, this had performance
    impacts on more models than the original commit was trying to fix.
    
    There is just a heuristic without an ideal solution so this goes back
    to the historic behavior.
    
    Fixes: #10765, #10756, #10752, #10726
    3fe74fba
memory.go 12.3 KB