"...git@developer.sourcefind.cn:lacacy/qwen_lmdeploy.git" did not exist on "bc3c64aa786dcfc0e2a68c26bed5ba66b74488af"
Reduce peak VRAM by releasing large attention tensors (as soon as they're unnecessary) (#3463)
Release large tensors in attention (as soon as they're no longer required). Reduces peak VRAM by nearly 2 GB for 1024x1024 (even after slicing), and the savings scale up with image size.
Showing
Please register or sign in to comment