• Jesse Gross's avatar
    llm: Use Ollama engine memory layouts for both old and new engines · f560bd07
    Jesse Gross authored
    Currently for both the old and new engines, there is code to
    calculate how much memory is required for a model and lay out
    the layers onto GPUs. This reuses the new engine's lay out code
    for the old engine as well, bringing them closer together. The
    old engine continues to use its current method of estimating
    required memory.
    
    This reduces maintainence effort and improves consistency, as new
    features only need to be implemented in one place. The newer code
    is also more accurate, especially with multiple GPUs.
    f560bd07
server.go 53.1 KB