partial offloading: allow flash attention and disable mmap (#4734)
* partial offloading: allow flash attention and disable mmap * allow mmap with num_gpu=0
Showing
Please register or sign in to comment
* partial offloading: allow flash attention and disable mmap * allow mmap with num_gpu=0