- 16 Dec, 2024 1 commit
-
-
Aryan authored
* copy transformer * copy vae * copy pipeline * make fix-copies * refactor; make original code work with diffusers; test latents for comparison generated with this commit * move rope into pipeline; remove flash attention; refactor * begin conversion script * make style * refactor attention * refactor * refactor final layer * their mlp -> our feedforward * make style * add docs * refactor layer names * refactor modulation * cleanup * refactor norms * refactor activations * refactor single blocks attention * refactor attention processor * make style * cleanup a bit * refactor double transformer block attention * update mochi attn proc * use diffusers attention implementation in all modules; checkpoint for all values matching original * remove helper functions in vae * refactor upsample * refactor causal conv * refactor resnet * refactor * refactor * refactor * grad checkpointing * autoencoder test * fix scaling factor * refactor clip * refactor llama text encoding * add coauthor Co-Authored-By:
"Gregory D. Hunkins" <greg@ollano.com> * refactor rope; diff: 0.14990234375; reason and fix: create rope grid on cpu and move to device Note: The following line diverges from original behaviour. We create the grid on the device, whereas original implementation creates it on CPU and then moves it to device. This results in numerical differences in layerwise debugging outputs, but visually it is the same. * use diffusers timesteps embedding; diff: 0.10205078125 * rename * convert * update * add tests for transformer * add pipeline tests; text encoder 2 is not optional * fix attention implementation for torch * add example * update docs * update docs * apply suggestions from review * refactor vae * update * Apply suggestions from code review Co-authored-by:
hlky <hlky@hlky.ac> * Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py Co-authored-by:
hlky <hlky@hlky.ac> * Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py Co-authored-by:
hlky <hlky@hlky.ac> * make fix-copies * update --------- Co-authored-by:
"Gregory D. Hunkins" <greg@ollano.com> Co-authored-by:
hlky <hlky@hlky.ac>
-
- 17 Nov, 2024 1 commit
-
-
_ authored
Correct pipeline_output.py
-
- 05 Nov, 2024 1 commit
-
-
Aryan authored
* update * udpate * update transformer * make style * fix * add conversion script * update * fix * update * fix * update * fixes * make style * update * update * update * init * update * update * add * up * up * up * update * mochi transformer * remove original implementation * make style * update inits * update conversion script * docs * Update src/diffusers/pipelines/mochi/pipeline_mochi.py Co-authored-by:
Dhruv Nair <dhruv.nair@gmail.com> * Update src/diffusers/pipelines/mochi/pipeline_mochi.py Co-authored-by:
Dhruv Nair <dhruv.nair@gmail.com> * fix docs * pipeline fixes * make style * invert sigmas in scheduler; fix pipeline * fix pipeline num_frames * flip proj and gate in swiglu * make style * fix * make style * fix tests * latent mean and std fix * update * cherry-pick 1069d210e1b9e84a366cdc7a13965626ea258178 * remove additional sigma already handled by flow match scheduler * fix * remove hardcoded value * replace conv1x1 with linear * Update src/diffusers/pipelines/mochi/pipeline_mochi.py Co-authored-by:
Dhruv Nair <dhruv.nair@gmail.com> * framewise decoding and conv_cache * make style * Apply suggestions from code review * mochi vae encoder changes * rebase correctly * Update scripts/convert_mochi_to_diffusers.py * fix tests * fixes * make style * update * make style * update * add framewise and tiled encoding * make style * make original vae implementation behaviour the default; note: framewise encoding does not work * remove framewise encoding implementation due to presence of attn layers * fight test 1 * fight test 2 --------- Co-authored-by:
Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by:
yiyixuxu <yixu310@gmail.com>
-
- 02 Sep, 2024 1 commit
-
-
Aryan authored
* add vid2vid pipeline for cogvideox * make fix-copies * update docs * fake context parallel cache, vae encode tiling * add test for cog vid2vid * use video link from HF docs repo * add copied from comments; correctly rename test class
-