Feat: Clear cache during weight loading to prevent OOM on GPUs with <=8GB VRAM
This change explicitly clears CUDA cache during weight loading to mitigate memory fragmentation issues, particularly beneficial for low-VRAM GPUs.
Showing
Please register or sign in to comment