Unverified Commit 468ae09e authored by Tolga Cangöz's avatar Tolga Cangöz Committed by GitHub
Browse files

Errata - Trim trailing white space in the whole repo (#8575)



* Trim all the trailing white space in the whole repo

* Remove unnecessary empty places

* make style && make quality

* Trim trailing white space

* trim

---------
Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
parent 3fca5202
...@@ -57,50 +57,50 @@ body: ...@@ -57,50 +57,50 @@ body:
description: | description: |
Your issue will be replied to more quickly if you can figure out the right person to tag with @. Your issue will be replied to more quickly if you can figure out the right person to tag with @.
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**. If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and
a core maintainer will ping the right person. a core maintainer will ping the right person.
Please tag a maximum of 2 people. Please tag a maximum of 2 people.
Questions on DiffusionPipeline (Saving, Loading, From pretrained, ...): Questions on DiffusionPipeline (Saving, Loading, From pretrained, ...):
Questions on pipelines: Questions on pipelines:
- Stable Diffusion @yiyixuxu @DN6 @sayakpaul - Stable Diffusion @yiyixuxu @DN6 @sayakpaul
- Stable Diffusion XL @yiyixuxu @sayakpaul @DN6 - Stable Diffusion XL @yiyixuxu @sayakpaul @DN6
- Kandinsky @yiyixuxu - Kandinsky @yiyixuxu
- ControlNet @sayakpaul @yiyixuxu @DN6 - ControlNet @sayakpaul @yiyixuxu @DN6
- T2I Adapter @sayakpaul @yiyixuxu @DN6 - T2I Adapter @sayakpaul @yiyixuxu @DN6
- IF @DN6 - IF @DN6
- Text-to-Video / Video-to-Video @DN6 @sayakpaul - Text-to-Video / Video-to-Video @DN6 @sayakpaul
- Wuerstchen @DN6 - Wuerstchen @DN6
- Other: @yiyixuxu @DN6 - Other: @yiyixuxu @DN6
Questions on models: Questions on models:
- UNet @DN6 @yiyixuxu @sayakpaul - UNet @DN6 @yiyixuxu @sayakpaul
- VAE @sayakpaul @DN6 @yiyixuxu - VAE @sayakpaul @DN6 @yiyixuxu
- Transformers/Attention @DN6 @yiyixuxu @sayakpaul @DN6 - Transformers/Attention @DN6 @yiyixuxu @sayakpaul @DN6
Questions on Schedulers: @yiyixuxu Questions on Schedulers: @yiyixuxu
Questions on LoRA: @sayakpaul Questions on LoRA: @sayakpaul
Questions on Textual Inversion: @sayakpaul Questions on Textual Inversion: @sayakpaul
Questions on Training: Questions on Training:
- DreamBooth @sayakpaul - DreamBooth @sayakpaul
- Text-to-Image Fine-tuning @sayakpaul - Text-to-Image Fine-tuning @sayakpaul
- Textual Inversion @sayakpaul - Textual Inversion @sayakpaul
- ControlNet @sayakpaul - ControlNet @sayakpaul
Questions on Tests: @DN6 @sayakpaul @yiyixuxu Questions on Tests: @DN6 @sayakpaul @yiyixuxu
Questions on Documentation: @stevhliu Questions on Documentation: @stevhliu
Questions on JAX- and MPS-related things: @pcuenca Questions on JAX- and MPS-related things: @pcuenca
Questions on audio pipelines: @DN6 Questions on audio pipelines: @DN6
placeholder: "@Username ..." placeholder: "@Username ..."
...@@ -38,9 +38,9 @@ members/contributors who may be interested in your PR. ...@@ -38,9 +38,9 @@ members/contributors who may be interested in your PR.
Core library: Core library:
- Schedulers: @yiyixuxu - Schedulers: @yiyixuxu
- Pipelines: @sayakpaul @yiyixuxu @DN6 - Pipelines: @sayakpaul @yiyixuxu @DN6
- Training examples: @sayakpaul - Training examples: @sayakpaul
- Docs: @stevhliu and @sayakpaul - Docs: @stevhliu and @sayakpaul
- JAX and MPS: @pcuenca - JAX and MPS: @pcuenca
- Audio: @sanchit-gandhi - Audio: @sanchit-gandhi
......
...@@ -36,7 +36,7 @@ jobs: ...@@ -36,7 +36,7 @@ jobs:
# If ref is 'refs/heads/main' => set 'main' # If ref is 'refs/heads/main' => set 'main'
# Else it must be a tag => set {tag} # Else it must be a tag => set {tag}
- name: Set checkout_ref and path_in_repo - name: Set checkout_ref and path_in_repo
run: | run: |
if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
if [ -z "${{ github.event.inputs.ref }}" ]; then if [ -z "${{ github.event.inputs.ref }}" ]; then
echo "Error: Missing ref input" echo "Error: Missing ref input"
......
...@@ -11,12 +11,12 @@ jobs: ...@@ -11,12 +11,12 @@ jobs:
steps: steps:
- uses: actions/checkout@v3 - uses: actions/checkout@v3
- name: Setup Python - name: Setup Python
uses: actions/setup-python@v4 uses: actions/setup-python@v4
with: with:
python-version: '3.8' python-version: '3.8'
- name: Notify Slack about the release - name: Notify Slack about the release
env: env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }} SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
......
...@@ -33,4 +33,3 @@ jobs: ...@@ -33,4 +33,3 @@ jobs:
run: | run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH" python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
pytest tests/others/test_dependencies.py pytest tests/others/test_dependencies.py
\ No newline at end of file
...@@ -115,14 +115,14 @@ jobs: ...@@ -115,14 +115,14 @@ jobs:
-s -v \ -s -v \
--make-reports=tests_models_lora_${{ matrix.config.report }} \ --make-reports=tests_models_lora_${{ matrix.config.report }} \
tests/models/ -k "lora" tests/models/ -k "lora"
- name: Failure short reports - name: Failure short reports
if: ${{ failure() }} if: ${{ failure() }}
run: | run: |
cat reports/tests_${{ matrix.config.report }}_failures_short.txt cat reports/tests_${{ matrix.config.report }}_failures_short.txt
cat reports/tests_models_lora_${{ matrix.config.report }}_failures_short.txt cat reports/tests_models_lora_${{ matrix.config.report }}_failures_short.txt
- name: Test suite reports artifacts - name: Test suite reports artifacts
if: ${{ always() }} if: ${{ always() }}
uses: actions/upload-artifact@v2 uses: actions/upload-artifact@v2
......
...@@ -29,7 +29,7 @@ jobs: ...@@ -29,7 +29,7 @@ jobs:
LATEST_BRANCH=$(python utils/fetch_latest_release_branch.py) LATEST_BRANCH=$(python utils/fetch_latest_release_branch.py)
echo "Latest branch: $LATEST_BRANCH" echo "Latest branch: $LATEST_BRANCH"
echo "latest_branch=$LATEST_BRANCH" >> $GITHUB_ENV echo "latest_branch=$LATEST_BRANCH" >> $GITHUB_ENV
- name: Set latest branch output - name: Set latest branch output
id: set_latest_branch id: set_latest_branch
run: echo "::set-output name=latest_branch::${{ env.latest_branch }}" run: echo "::set-output name=latest_branch::${{ env.latest_branch }}"
...@@ -43,27 +43,27 @@ jobs: ...@@ -43,27 +43,27 @@ jobs:
uses: actions/checkout@v3 uses: actions/checkout@v3
with: with:
ref: ${{ needs.find-and-checkout-latest-branch.outputs.latest_branch }} ref: ${{ needs.find-and-checkout-latest-branch.outputs.latest_branch }}
- name: Setup Python - name: Setup Python
uses: actions/setup-python@v4 uses: actions/setup-python@v4
with: with:
python-version: "3.8" python-version: "3.8"
- name: Install dependencies - name: Install dependencies
run: | run: |
python -m pip install --upgrade pip python -m pip install --upgrade pip
pip install -U setuptools wheel twine pip install -U setuptools wheel twine
pip install -U torch --index-url https://download.pytorch.org/whl/cpu pip install -U torch --index-url https://download.pytorch.org/whl/cpu
pip install -U transformers pip install -U transformers
- name: Build the dist files - name: Build the dist files
run: python setup.py bdist_wheel && python setup.py sdist run: python setup.py bdist_wheel && python setup.py sdist
- name: Publish to the test PyPI - name: Publish to the test PyPI
env: env:
TWINE_USERNAME: ${{ secrets.TEST_PYPI_USERNAME }} TWINE_USERNAME: ${{ secrets.TEST_PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.TEST_PYPI_PASSWORD }} TWINE_PASSWORD: ${{ secrets.TEST_PYPI_PASSWORD }}
run: twine upload dist/* -r pypitest --repository-url=https://test.pypi.org/legacy/ run: twine upload dist/* -r pypitest --repository-url=https://test.pypi.org/legacy/
- name: Test installing diffusers and importing - name: Test installing diffusers and importing
run: | run: |
......
...@@ -7,7 +7,7 @@ on: ...@@ -7,7 +7,7 @@ on:
default: 'diffusers/diffusers-pytorch-cuda' default: 'diffusers/diffusers-pytorch-cuda'
description: 'Name of the Docker image' description: 'Name of the Docker image'
required: true required: true
branch: branch:
description: 'PR Branch to test on' description: 'PR Branch to test on'
required: true required: true
test: test:
...@@ -34,19 +34,19 @@ jobs: ...@@ -34,19 +34,19 @@ jobs:
steps: steps:
- name: Validate test files input - name: Validate test files input
id: validate_test_files id: validate_test_files
env: env:
PY_TEST: ${{ github.event.inputs.test }} PY_TEST: ${{ github.event.inputs.test }}
run: | run: |
if [[ ! "$PY_TEST" =~ ^tests/ ]]; then if [[ ! "$PY_TEST" =~ ^tests/ ]]; then
echo "Error: The input string must start with 'tests/'." echo "Error: The input string must start with 'tests/'."
exit 1 exit 1
fi fi
if [[ ! "$PY_TEST" =~ ^tests/(models|pipelines) ]]; then if [[ ! "$PY_TEST" =~ ^tests/(models|pipelines) ]]; then
echo "Error: The input string must contain either 'models' or 'pipelines' after 'tests/'." echo "Error: The input string must contain either 'models' or 'pipelines' after 'tests/'."
exit 1 exit 1
fi fi
if [[ "$PY_TEST" == *";"* ]]; then if [[ "$PY_TEST" == *";"* ]]; then
echo "Error: The input string must not contain ';'." echo "Error: The input string must not contain ';'."
exit 1 exit 1
...@@ -60,14 +60,14 @@ jobs: ...@@ -60,14 +60,14 @@ jobs:
repository: ${{ github.event.pull_request.head.repo.full_name }} repository: ${{ github.event.pull_request.head.repo.full_name }}
- name: Install pytest - name: Install pytest
run: | run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH" python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test] python -m uv pip install -e [quality,test]
python -m uv pip install peft python -m uv pip install peft
- name: Run tests - name: Run tests
env: env:
PY_TEST: ${{ github.event.inputs.test }} PY_TEST: ${{ github.event.inputs.test }}
run: | run: |
pytest "$PY_TEST" pytest "$PY_TEST"
\ No newline at end of file
...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. ...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# PixArtTransformer2DModel # PixArtTransformer2DModel
A Transformer model for image-like data from [PixArt-Alpha](https://huggingface.co/papers/2310.00426) and [PixArt-Sigma](https://huggingface.co/papers/2403.04692). A Transformer model for image-like data from [PixArt-Alpha](https://huggingface.co/papers/2310.00426) and [PixArt-Sigma](https://huggingface.co/papers/2403.04692).
## PixArtTransformer2DModel ## PixArtTransformer2DModel
......
...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. ...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# SD3 Transformer Model # SD3 Transformer Model
The Transformer model introduced in [Stable Diffusion 3](https://hf.co/papers/2403.03206). Its novelty lies in the MMDiT transformer block. The Transformer model introduced in [Stable Diffusion 3](https://hf.co/papers/2403.03206). Its novelty lies in the MMDiT transformer block.
## SD3Transformer2DModel ## SD3Transformer2DModel
......
...@@ -78,7 +78,6 @@ output = pipe( ...@@ -78,7 +78,6 @@ output = pipe(
) )
frames = output.frames[0] frames = output.frames[0]
export_to_gif(frames, "animation.gif") export_to_gif(frames, "animation.gif")
``` ```
Here are some sample outputs: Here are some sample outputs:
...@@ -303,7 +302,6 @@ output = pipe( ...@@ -303,7 +302,6 @@ output = pipe(
) )
frames = output.frames[0] frames = output.frames[0]
export_to_gif(frames, "animation.gif") export_to_gif(frames, "animation.gif")
``` ```
<table> <table>
...@@ -378,7 +376,6 @@ output = pipe( ...@@ -378,7 +376,6 @@ output = pipe(
) )
frames = output.frames[0] frames = output.frames[0]
export_to_gif(frames, "animation.gif") export_to_gif(frames, "animation.gif")
``` ```
<table> <table>
......
...@@ -20,8 +20,8 @@ The abstract of the paper is the following: ...@@ -20,8 +20,8 @@ The abstract of the paper is the following:
*Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ from those of other types. To bring us closer to a unified perspective of audio generation, this paper proposes a framework that utilizes the same learning method for speech, music, and sound effect generation. Our framework introduces a general representation of audio, called "language of audio" (LOA). Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model. In the generation process, we translate any modalities into LOA by using a GPT-2 model, and we perform self-supervised audio generation learning with a latent diffusion model conditioned on LOA. The proposed framework naturally brings advantages such as in-context learning abilities and reusable self-supervised pretrained AudioMAE and latent diffusion models. Experiments on the major benchmarks of text-to-audio, text-to-music, and text-to-speech demonstrate state-of-the-art or competitive performance against previous approaches. Our code, pretrained model, and demo are available at [this https URL](https://audioldm.github.io/audioldm2).* *Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ from those of other types. To bring us closer to a unified perspective of audio generation, this paper proposes a framework that utilizes the same learning method for speech, music, and sound effect generation. Our framework introduces a general representation of audio, called "language of audio" (LOA). Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model. In the generation process, we translate any modalities into LOA by using a GPT-2 model, and we perform self-supervised audio generation learning with a latent diffusion model conditioned on LOA. The proposed framework naturally brings advantages such as in-context learning abilities and reusable self-supervised pretrained AudioMAE and latent diffusion models. Experiments on the major benchmarks of text-to-audio, text-to-music, and text-to-speech demonstrate state-of-the-art or competitive performance against previous approaches. Our code, pretrained model, and demo are available at [this https URL](https://audioldm.github.io/audioldm2).*
This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi) and [Nguyễn Công Tú Anh](https://github.com/tuanh123789). The original codebase can be This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi) and [Nguyễn Công Tú Anh](https://github.com/tuanh123789). The original codebase can be
found at [haoheliu/audioldm2](https://github.com/haoheliu/audioldm2). found at [haoheliu/audioldm2](https://github.com/haoheliu/audioldm2).
## Tips ## Tips
......
...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. ...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# BLIP-Diffusion # BLIP-Diffusion
BLIP-Diffusion was proposed in [BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing](https://arxiv.org/abs/2305.14720). It enables zero-shot subject-driven generation and control-guided zero-shot generation. BLIP-Diffusion was proposed in [BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing](https://arxiv.org/abs/2305.14720). It enables zero-shot subject-driven generation and control-guided zero-shot generation.
The abstract from the paper is: The abstract from the paper is:
......
...@@ -36,7 +36,7 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m ...@@ -36,7 +36,7 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m
## Optimization ## Optimization
You can optimize the pipeline's runtime and memory consumption with torch.compile and feed-forward chunking. To learn about other optimization methods, check out the [Speed up inference](../../optimization/fp16) and [Reduce memory usage](../../optimization/memory) guides. You can optimize the pipeline's runtime and memory consumption with torch.compile and feed-forward chunking. To learn about other optimization methods, check out the [Speed up inference](../../optimization/fp16) and [Reduce memory usage](../../optimization/memory) guides.
### Inference ### Inference
...@@ -46,7 +46,7 @@ First, load the pipeline: ...@@ -46,7 +46,7 @@ First, load the pipeline:
```python ```python
from diffusers import HunyuanDiTPipeline from diffusers import HunyuanDiTPipeline
import torch import torch
pipeline = HunyuanDiTPipeline.from_pretrained( pipeline = HunyuanDiTPipeline.from_pretrained(
"Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16 "Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16
...@@ -78,7 +78,7 @@ Without torch.compile(): Average inference time: 20.570 seconds. ...@@ -78,7 +78,7 @@ Without torch.compile(): Average inference time: 20.570 seconds.
### Memory optimization ### Memory optimization
By loading the T5 text encoder in 8 bits, you can run the pipeline in just under 6 GBs of GPU VRAM. Refer to [this script](https://gist.github.com/sayakpaul/3154605f6af05b98a41081aaba5ca43e) for details. By loading the T5 text encoder in 8 bits, you can run the pipeline in just under 6 GBs of GPU VRAM. Refer to [this script](https://gist.github.com/sayakpaul/3154605f6af05b98a41081aaba5ca43e) for details.
Furthermore, you can use the [`~HunyuanDiT2DModel.enable_forward_chunking`] method to reduce memory usage. Feed-forward chunking runs the feed-forward layers in a transformer block in a loop instead of all at once. This gives you a trade-off between memory consumption and inference runtime. Furthermore, you can use the [`~HunyuanDiT2DModel.enable_forward_chunking`] method to reduce memory usage. Feed-forward chunking runs the feed-forward layers in a transformer block in a loop instead of all at once. This gives you a trade-off between memory consumption and inference runtime.
...@@ -92,4 +92,4 @@ Furthermore, you can use the [`~HunyuanDiT2DModel.enable_forward_chunking`] meth ...@@ -92,4 +92,4 @@ Furthermore, you can use the [`~HunyuanDiT2DModel.enable_forward_chunking`] meth
[[autodoc]] HunyuanDiTPipeline [[autodoc]] HunyuanDiTPipeline
- all - all
- __call__ - __call__
...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. ...@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# K-Diffusion # K-Diffusion
[k-diffusion](https://github.com/crowsonkb/k-diffusion) is a popular library created by [Katherine Crowson](https://github.com/crowsonkb/). We provide `StableDiffusionKDiffusionPipeline` and `StableDiffusionXLKDiffusionPipeline` that allow you to run Stable DIffusion with samplers from k-diffusion. [k-diffusion](https://github.com/crowsonkb/k-diffusion) is a popular library created by [Katherine Crowson](https://github.com/crowsonkb/). We provide `StableDiffusionKDiffusionPipeline` and `StableDiffusionXLKDiffusionPipeline` that allow you to run Stable DIffusion with samplers from k-diffusion.
Note that most the samplers from k-diffusion are implemented in Diffusers and we recommend using existing schedulers. You can find a mapping between k-diffusion samplers and schedulers in Diffusers [here](https://huggingface.co/docs/diffusers/api/schedulers/overview) Note that most the samplers from k-diffusion are implemented in Diffusers and we recommend using existing schedulers. You can find a mapping between k-diffusion samplers and schedulers in Diffusers [here](https://huggingface.co/docs/diffusers/api/schedulers/overview)
......
...@@ -12,11 +12,11 @@ specific language governing permissions and limitations under the License. ...@@ -12,11 +12,11 @@ specific language governing permissions and limitations under the License.
# Text-to-(RGB, depth) # Text-to-(RGB, depth)
LDM3D was proposed in [LDM3D: Latent Diffusion Model for 3D](https://huggingface.co/papers/2305.10853) by Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, and Vasudev Lal. LDM3D generates an image and a depth map from a given text prompt unlike the existing text-to-image diffusion models such as [Stable Diffusion](./overview) which only generates an image. With almost the same number of parameters, LDM3D achieves to create a latent space that can compress both the RGB images and the depth maps. LDM3D was proposed in [LDM3D: Latent Diffusion Model for 3D](https://huggingface.co/papers/2305.10853) by Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, and Vasudev Lal. LDM3D generates an image and a depth map from a given text prompt unlike the existing text-to-image diffusion models such as [Stable Diffusion](./overview) which only generates an image. With almost the same number of parameters, LDM3D achieves to create a latent space that can compress both the RGB images and the depth maps.
Two checkpoints are available for use: Two checkpoints are available for use:
- [ldm3d-original](https://huggingface.co/Intel/ldm3d). The original checkpoint used in the [paper](https://arxiv.org/pdf/2305.10853.pdf) - [ldm3d-original](https://huggingface.co/Intel/ldm3d). The original checkpoint used in the [paper](https://arxiv.org/pdf/2305.10853.pdf)
- [ldm3d-4c](https://huggingface.co/Intel/ldm3d-4c). The new version of LDM3D using 4 channels inputs instead of 6-channels inputs and finetuned on higher resolution images. - [ldm3d-4c](https://huggingface.co/Intel/ldm3d-4c). The new version of LDM3D using 4 channels inputs instead of 6-channels inputs and finetuned on higher resolution images.
The abstract from the paper is: The abstract from the paper is:
...@@ -44,7 +44,7 @@ Make sure to check out the Stable Diffusion [Tips](overview#tips) section to lea ...@@ -44,7 +44,7 @@ Make sure to check out the Stable Diffusion [Tips](overview#tips) section to lea
# Upscaler # Upscaler
[LDM3D-VR](https://arxiv.org/pdf/2311.03226.pdf) is an extended version of LDM3D. [LDM3D-VR](https://arxiv.org/pdf/2311.03226.pdf) is an extended version of LDM3D.
The abstract from the paper is: The abstract from the paper is:
*Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods* *Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods*
......
...@@ -155,28 +155,28 @@ To generate a video from prompt with additional pose control ...@@ -155,28 +155,28 @@ To generate a video from prompt with additional pose control
imageio.mimsave("video.mp4", result, fps=4) imageio.mimsave("video.mp4", result, fps=4)
``` ```
- #### SDXL Support - #### SDXL Support
Since our attention processor also works with SDXL, it can be utilized to generate a video from prompt using ControlNet models powered by SDXL: Since our attention processor also works with SDXL, it can be utilized to generate a video from prompt using ControlNet models powered by SDXL:
```python ```python
import torch import torch
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
from diffusers.pipelines.text_to_video_synthesis.pipeline_text_to_video_zero import CrossFrameAttnProcessor from diffusers.pipelines.text_to_video_synthesis.pipeline_text_to_video_zero import CrossFrameAttnProcessor
controlnet_model_id = 'thibaud/controlnet-openpose-sdxl-1.0' controlnet_model_id = 'thibaud/controlnet-openpose-sdxl-1.0'
model_id = 'stabilityai/stable-diffusion-xl-base-1.0' model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
controlnet = ControlNetModel.from_pretrained(controlnet_model_id, torch_dtype=torch.float16) controlnet = ControlNetModel.from_pretrained(controlnet_model_id, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained( pipe = StableDiffusionControlNetPipeline.from_pretrained(
model_id, controlnet=controlnet, torch_dtype=torch.float16 model_id, controlnet=controlnet, torch_dtype=torch.float16
).to('cuda') ).to('cuda')
# Set the attention processor # Set the attention processor
pipe.unet.set_attn_processor(CrossFrameAttnProcessor(batch_size=2)) pipe.unet.set_attn_processor(CrossFrameAttnProcessor(batch_size=2))
pipe.controlnet.set_attn_processor(CrossFrameAttnProcessor(batch_size=2)) pipe.controlnet.set_attn_processor(CrossFrameAttnProcessor(batch_size=2))
# fix latents for all frames # fix latents for all frames
latents = torch.randn((1, 4, 128, 128), device="cuda", dtype=torch.float16).repeat(len(pose_images), 1, 1, 1) latents = torch.randn((1, 4, 128, 128), device="cuda", dtype=torch.float16).repeat(len(pose_images), 1, 1, 1)
prompt = "Darth Vader dancing in a desert" prompt = "Darth Vader dancing in a desert"
result = pipe(prompt=[prompt] * len(pose_images), image=pose_images, latents=latents).images result = pipe(prompt=[prompt] * len(pose_images), image=pose_images, latents=latents).images
imageio.mimsave("video.mp4", result, fps=4) imageio.mimsave("video.mp4", result, fps=4)
......
...@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o ...@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License. specific language governing permissions and limitations under the License.
--> -->
# TCDScheduler # TCDScheduler
[Trajectory Consistency Distillation](https://huggingface.co/papers/2402.19159) by Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, Dacheng Tao and Tat-Jen Cham introduced a Strategic Stochastic Sampling (Algorithm 4) that is capable of generating good samples in a small number of steps. Distinguishing it as an advanced iteration of the multistep scheduler (Algorithm 1) in the [Consistency Models](https://huggingface.co/papers/2303.01469), Strategic Stochastic Sampling specifically tailored for the trajectory consistency function. [Trajectory Consistency Distillation](https://huggingface.co/papers/2402.19159) by Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, Dacheng Tao and Tat-Jen Cham introduced a Strategic Stochastic Sampling (Algorithm 4) that is capable of generating good samples in a small number of steps. Distinguishing it as an advanced iteration of the multistep scheduler (Algorithm 1) in the [Consistency Models](https://huggingface.co/papers/2303.01469), Strategic Stochastic Sampling specifically tailored for the trajectory consistency function.
......
...@@ -26,7 +26,7 @@ pipeline.unet.config["in_channels"] ...@@ -26,7 +26,7 @@ pipeline.unet.config["in_channels"]
9 9
``` ```
To adapt your text-to-image model for inpainting, you'll need to change the number of `in_channels` from 4 to 9. To adapt your text-to-image model for inpainting, you'll need to change the number of `in_channels` from 4 to 9.
Initialize a [`UNet2DConditionModel`] with the pretrained text-to-image model weights, and change `in_channels` to 9. Changing the number of `in_channels` means you need to set `ignore_mismatched_sizes=True` and `low_cpu_mem_usage=False` to avoid a size mismatch error because the shape is different now. Initialize a [`UNet2DConditionModel`] with the pretrained text-to-image model weights, and change `in_channels` to 9. Changing the number of `in_channels` means you need to set `ignore_mismatched_sizes=True` and `low_cpu_mem_usage=False` to avoid a size mismatch error because the shape is different now.
......
...@@ -9,7 +9,7 @@ This guide will show you two ways to create a dataset to finetune on: ...@@ -9,7 +9,7 @@ This guide will show you two ways to create a dataset to finetune on:
<Tip> <Tip>
💡 Learn more about how to create an image dataset for training in the [Create an image dataset](https://huggingface.co/docs/datasets/image_dataset) guide. 💡 Learn more about how to create an image dataset for training in the [Create an image dataset](https://huggingface.co/docs/datasets/image_dataset) guide.
</Tip> </Tip>
...@@ -39,7 +39,7 @@ accelerate launch train_unconditional.py \ ...@@ -39,7 +39,7 @@ accelerate launch train_unconditional.py \
</Tip> </Tip>
Start by creating a dataset with the [`ImageFolder`](https://huggingface.co/docs/datasets/image_load#imagefolder) feature, which creates an `image` column containing the PIL-encoded images. Start by creating a dataset with the [`ImageFolder`](https://huggingface.co/docs/datasets/image_load#imagefolder) feature, which creates an `image` column containing the PIL-encoded images.
You can use the `data_dir` or `data_files` parameters to specify the location of the dataset. The `data_files` parameter supports mapping specific files to dataset splits like `train` or `test`: You can use the `data_dir` or `data_files` parameters to specify the location of the dataset. The `data_files` parameter supports mapping specific files to dataset splits like `train` or `test`:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment