"...text-generation-inference.git" did not exist on "0c478846c5002a4053b0349d6557bafb9cedc935"
Feat: Clear cache during weight loading to prevent OOM on GPUs with <=8GB VRAM
This change explicitly clears CUDA cache during weight loading to mitigate memory fragmentation issues, particularly beneficial for low-VRAM GPUs.
Showing
Please register or sign in to comment