Adjust mmap logic for cuda windows for faster model load
On Windows, recent llama.cpp changes make mmap slower in most cases, so default to off. This also implements a tri-state for use_mmap so we can detect the difference between a user provided value of true/false, or unspecified.
Showing
Please register or sign in to comment