- 05 Dec, 2025 1 commit
-
-
swappy authored
* fix: group offloading to support standalone computational layers in block-level offloading * test: for models with standalone and deeply nested layers in block-level offloading * feat: support for block-level offloading in group offloading config * fix: group offload block modules to AutoencoderKL and AutoencoderKLWan * fix: update group offloading tests to use AutoencoderKL and adjust input dimensions * refactor: streamline block offloading logic * Apply style fixes * update tests * update * fix for failing tests * clean up * revert to use skip_keys * clean up --------- Co-authored-by:
github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by:
Dhruv Nair <dhruv.nair@gmail.com>
-
- 28 Aug, 2025 1 commit
-
-
Dhruv Nair authored
* update * update * update * update * update * merge main * Revert "merge main" This reverts commit 65efbcead58644b31596ed2d714f7cee0e0238d3.
-
- 06 Aug, 2025 1 commit
-
-
Aryan authored
* update * update * refactor * add test * address review comment * nit
-
- 18 Jun, 2025 1 commit
-
-
Sayak Paul authored
change to 2025 licensing for remaining
-
- 30 May, 2025 1 commit
-
-
Yao Matrix authored
* enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed Signed-off-by:
Matrix YAO <matrix.yao@intel.com> * fix style Signed-off-by:
Matrix YAO <matrix.yao@intel.com> * fix Signed-off-by:
Matrix YAO <matrix.yao@intel.com> --------- Signed-off-by:
Matrix YAO <matrix.yao@intel.com> Co-authored-by:
Aryan <aryan@huggingface.co>
-
- 23 Apr, 2025 1 commit
-
-
Aryan authored
* fix * add tests * add message check
-
- 14 Feb, 2025 1 commit
-
-
Aryan authored
* update * fix * non_blocking; handle parameters and buffers * update * Group offloading with cuda stream prefetching (#10516) * cuda stream prefetch * remove breakpoints * update * copy model hook implementation from pab * update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite * more workarounds to make it actually work * cleanup * rewrite * update * make sure to sync current stream before overwriting with pinned params not doing so will lead to erroneous computations on the GPU and cause bad results * better check * update * remove hook implementation to not deal with merge conflict * re-add hook changes * why use more memory when less memory do trick * why still use slightly more memory when less memory do trick * optimise * add model tests * add pipeline tests * update docs * add layernorm and groupnorm * address review comments * improve tests; add docs * improve docs * Apply suggestions from code review Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com> * apply suggestions from code review * update tests * apply suggestions from review * enable_group_offloading -> enable_group_offload for naming consistency * raise errors if multiple offloading strategies used; add relevant tests * handle .to() when group offload applied * refactor some repeated code * remove unintentional change from merge conflict * handle .cuda() --------- Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com>
-