[Docs] Let's go (#385)

5a38033d · Patrick von Platen · GitHub · 7bd50cab · 5a38033d · 5a38033d
Unverified Commit 5a38033d authored Sep 07, 2022 by Patrick von Platen Committed by GitHub Sep 07, 2022
20 changed files
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
 - sections:
  - local: index
-    title: 🧨 Diffusers
+    title: "🧨 Diffusers"
  - local: quicktour
-    title: Quicktour
-  - local: philosophy
-    title: Philosophy
-  title: Get started
+    title: "Quicktour"
+  - local: installation
+    title: "Installation"
+  title: "Get started"
 - sections:
  - sections:
-    - local: examples/diffusers_for_vision
-      title: Diffusers for Vision
-    - local: examples/diffusers_for_audio
-      title: Diffusers for Audio
-    - local: examples/diffusers_for_other
-      title: Diffusers for Other Modalities
-    title: Examples
-  title: Using Diffusers
- sections:
+    - local: using-diffusers/loading
+      title: "Loading Pipelines, Models, and Schedulers"
+    - local: using-diffusers/configuration
+      title: "Configuring Pipelines, Models, and Schedulers"
+    title: "Loading"
  - sections:
-    - local: pipelines
-      title: Pipelines
-    - local: schedulers
-      title: Schedulers
-    - local: models
-      title: Models
-    title: Main Classes
+    - local: using-diffusers/unconditional_image_generation
+      title: "Unconditional Image Generation"
+    - local: using-diffusers/conditional_image_generation
+      title: "Text-to-Image Generation"
+    - local: using-diffusers/img2img
+      title: "Text-Guided Image-to-Image"
+    - local: using-diffusers/inpaint
+      title: "Text-Guided Image-Inpainting"
+    - local: using-diffusers/custom
+      title: "Create a custom pipeline"
+    title: "Pipelines for Inference"
+  title: "Using Diffusers"
+- sections:
+  - local: optimization/fp16
+    title: "Torch Float16"
+  - local: optimization/onnx
+    title: "ONNX"
+  - local: optimization/open_vino
+    title: "Open Vino"
+  - local: optimization/mps
+    title: "MPS"
+  - local: optimization/other
+    title: "Other"
+  title: "Optimization/Special Hardware"
+- sections:
+  - local: training/unconditional_training
+    title: "Unconditional Image Generation"
+  - local: training/text_inversion
+    title: "Text Inversion"
+  - local: training/text2image
+    title: "Text-to-image"
+  title: "Training"
+- sections:
+  - local: conceptual/stable_diffusion
+    title: "Stable Diffusion"
+  - local: conceptual/philosophy
+    title: "Philosophy"
+  - local: conceptual/contribution
+    title: "How to contribute?"
+  title: "Conceptual Guides"
+- sections:
  - sections:
-    - local: pipelines/glide
-      title: "Glide"
-    title: Pipelines
+    - local: api/models
+      title: "Models"
+    - local: api/schedulers
+      title: "Schedulers"
+    - local: api/diffusion_pipeline
+      title: "Diffusion Pipeline"
+    - local: api/logging
+      title: "Logging"
+    - local: api/configuration
+      title: "Configuration"
+    - local: api/outputs
+      title: "Outputs"
+    title: "Main Classes"
  - sections:
-    - local: schedulers/ddpm
+    - local: api/pipelines/ddim
+      title: "DDIM"
+    - local: api/pipelines/ddpm
      title: "DDPM"
-    title: Schedulers
-  - sections:
-    - local: models/unet
-      title: "Unet"
-    title: Models
-  title: API
+    - local: api/pipelines/latent_diffusion
+      title: "Latent Diffusion"
+    - local: api/pipelines/latent_diffusion_uncond
+      title: "Unconditional Latent Diffusion"
+    - local: api/pipelines/pndm
+      title: "PNDM"
+    - local: api/pipelines/score_sde_ve
+      title: "Score SDE VE"
+    - local: api/pipelines/stable_diffusion
+      title: "Stable Diffusion"
+    - local: api/pipelines/stochastic_karras_ve
+      title: "Stochastic Karras VE"
+    title: "Pipelines"
+  titel: "API"
--- a/docs/source/_toctree_new.yml
+++ b/docs/source/_toctree_new.yml
+- sections:
+  - local: index
+    title: "🧨 Diffusers"
+  - local: quicktour
+    title: "Quicktour"
+  - local: installation
+    title: "Installation"
+  title: "Get started"
+- sections:
+  - local: get-started/loading
+  - title: "Loading Pipelines"
+  - local: get-started/unconditional_image_generation
+  - title: "Unconditional Image Generation"
+  - local: get-started/conditional_image_generation
+  - title: "Text-to-Image Generation"
+  - local: get-started/img2img
+  - title: "Text-Guided Image-to-Image"
+  - local: get-started/inpaint
+  - title: "Text-Guided Image-Inpainting"
+  - local: get-started/custom
+  - title: "Create a custom pipeline"
+  title: "Pipelines for Inference"
+- sections:
+  - local: optimization/fp16
+    title: "Torch Float16"
+  - local: optimization/onnx
+    title: "ONNX"
+  - local: optimization/open_vino
+    title: "Open Vino"
+  - local: optimization/mps
+    title: "MPS"
+  - local: optimization/other
+    title: "Other"
+  title: "Optimization/Special Hardware"
+- sections:
+  - local: training/unconditional_training
+    title: "Unconditional Image Generation"
+  - local: training/text_inversion
+    title: "Text Inversion"
+  - local: training/text2image
+    title: "Text-to-image"
+  title: "Training"
+- sections:
+  - local: conceptual/stable_diffusion
+    title: "Stable Diffusion"
+  - local: conceptual/philosophy
+    title: "Philosophy"
+  - local: conceptual/contribution
+    title: "How to contribute?"
+  title: "Conceptual Guides"
+- sections:
+  - sections:
+    - local: pipelines
+      title: "Pipelines"
+    - local: schedulers
+      title: "Schedulers"
+    - local: models
+      title: "Models"
+    title: "Main Classes"
+  - sections:
+    - local: pipelines/glide
+      title: "Glide"
+    title: Pipelines
+  - sections:
+    - local: schedulers/ddpm
+      title: "DDPM"
+    title: Schedulers
+  - sections:
+    - local: models/unet
+      title: "Unet"
+    title: Models
+  title: API
--- a/docs/source/models.mdx
+++ b/docs/source/models.mdx
--- a/docs/source/api/diffusion_pipeline.mdx
+++ b/docs/source/api/diffusion_pipeline.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Models
+
+Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
+The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
+The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
+
+## API
+
+Models should provide the `def forward` function and initialization of the model.
+All saving, loading, and utilities should be in the base ['ModelMixin'] class.
+
+## Examples
+
+- The ['UNetModel'] was proposed in [TODO](https://arxiv.org/) and has been used in paper1, paper2, paper3.
+- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the [GLIDE](https://arxiv.org/abs/2112.10741) paper, the ['UNetGradTTS'] model from this [paper](https://arxiv.org/abs/2105.06337) for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this [paper](https://arxiv.org/abs/2112.10752), and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning [paper](https://arxiv.org/abs/2205.09991).
+- TODO: mention VAE / SDE score estimation
\ No newline at end of file
--- a/docs/source/api/logging.mdx
+++ b/docs/source/api/logging.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Models
+
+Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
+The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
+The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
+
+## API
+
+Models should provide the `def forward` function and initialization of the model.
+All saving, loading, and utilities should be in the base ['ModelMixin'] class.
+
+## Examples
+
+- The ['UNetModel'] was proposed in [TODO](https://arxiv.org/) and has been used in paper1, paper2, paper3.
+- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the [GLIDE](https://arxiv.org/abs/2112.10741) paper, the ['UNetGradTTS'] model from this [paper](https://arxiv.org/abs/2105.06337) for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this [paper](https://arxiv.org/abs/2112.10752), and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning [paper](https://arxiv.org/abs/2205.09991).
+- TODO: mention VAE / SDE score estimation
\ No newline at end of file
--- a/docs/source/api/models.mdx
+++ b/docs/source/api/models.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Models
+
+Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
+The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
+The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
+
+## API
+
+Models should provide the `def forward` function and initialization of the model.
+All saving, loading, and utilities should be in the base ['ModelMixin'] class.
+
+## Examples
+
+- The ['UNetModel'] was proposed in [TODO](https://arxiv.org/) and has been used in paper1, paper2, paper3.
+- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the [GLIDE](https://arxiv.org/abs/2112.10741) paper, the ['UNetGradTTS'] model from this [paper](https://arxiv.org/abs/2105.06337) for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this [paper](https://arxiv.org/abs/2112.10752), and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning [paper](https://arxiv.org/abs/2205.09991).
+- TODO: mention VAE / SDE score estimation
\ No newline at end of file
--- a/docs/source/api/outputs.mdx
+++ b/docs/source/api/outputs.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Models
+
+Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
+The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
+The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
+
+## API
+
+Models should provide the `def forward` function and initialization of the model.
+All saving, loading, and utilities should be in the base ['ModelMixin'] class.
+
+## Examples
+
+- The ['UNetModel'] was proposed in [TODO](https://arxiv.org/) and has been used in paper1, paper2, paper3.
+- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the [GLIDE](https://arxiv.org/abs/2112.10741) paper, the ['UNetGradTTS'] model from this [paper](https://arxiv.org/abs/2105.06337) for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this [paper](https://arxiv.org/abs/2112.10752), and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning [paper](https://arxiv.org/abs/2205.09991).
+- TODO: mention VAE / SDE score estimation
\ No newline at end of file
--- a/docs/source/pipelines/glide.mdx
+++ b/docs/source/pipelines/glide.mdx
--- a/docs/source/api/pipelines/ddpm.mdx
+++ b/docs/source/api/pipelines/ddpm.mdx
+# GLIDE MODEL
\ No newline at end of file
--- a/docs/source/api/pipelines/latent_diffusion.mdx
+++ b/docs/source/api/pipelines/latent_diffusion.mdx
+# GLIDE MODEL
\ No newline at end of file
--- a/docs/source/api/pipelines/latent_diffusion_uncond.mdx
+++ b/docs/source/api/pipelines/latent_diffusion_uncond.mdx
+# GLIDE MODEL
\ No newline at end of file
--- a/docs/source/api/pipelines/pndm.mdx
+++ b/docs/source/api/pipelines/pndm.mdx
+# GLIDE MODEL
\ No newline at end of file
--- a/docs/source/api/pipelines/score_sde_ve.mdx
+++ b/docs/source/api/pipelines/score_sde_ve.mdx
+# GLIDE MODEL
\ No newline at end of file
--- a/docs/source/api/pipelines/stable_diffusion.mdx
+++ b/docs/source/api/pipelines/stable_diffusion.mdx
+# GLIDE MODEL
\ No newline at end of file
--- a/docs/source/api/pipelines/stochastic_karras_ve.mdx
+++ b/docs/source/api/pipelines/stochastic_karras_ve.mdx
+# GLIDE MODEL
\ No newline at end of file
--- a/docs/source/api/schedulers.mdx
+++ b/docs/source/api/schedulers.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Models
+
+Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
+The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
+The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
+
+## API
+
+Models should provide the `def forward` function and initialization of the model.
+All saving, loading, and utilities should be in the base ['ModelMixin'] class.
+
+## Examples
+
+- The ['UNetModel'] was proposed in [TODO](https://arxiv.org/) and has been used in paper1, paper2, paper3.
+- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the [GLIDE](https://arxiv.org/abs/2112.10741) paper, the ['UNetGradTTS'] model from this [paper](https://arxiv.org/abs/2105.06337) for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this [paper](https://arxiv.org/abs/2112.10752), and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning [paper](https://arxiv.org/abs/2205.09991).
+- TODO: mention VAE / SDE score estimation
\ No newline at end of file
--- a/docs/source/examples/diffusers_for_audio.mdx
+++ b/docs/source/examples/diffusers_for_audio.mdx
@@ -10,4 +10,23 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

-# Diffusers for audio
\ No newline at end of file
+
+
+# Quicktour
+
+Start using Diffusers🧨 quickly!
+To start, use the [`DiffusionPipeline`] for quick inference and sample generations!
+
+```
+pip install diffusers
+```
+
+## Main classes
+
+### Models
+
+### Schedulers
+
+### Pipeliens
+
+
--- a/docs/source/philosophy.mdx
+++ b/docs/source/philosophy.mdx
@@ -13,5 +13,6 @@ specific language governing permissions and limitations under the License.
 # Philosophy

 - Readability and clarity is prefered over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and provide well-commented code that can be read alongside the original paper.
+- Lol
 - Diffusers is **modality independent** and focusses on providing pretrained models and tools to build systems that generate **continous outputs**, *e.g.* vision and audio.
 - Diffusion models and schedulers are provided as consise, elementary building blocks whereas diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of other library, such as text-encoders. Examples for diffusion pipelines are [Glide](https://github.com/openai/glide-text2im) and [Latent Diffusion](https://github.com/CompVis/latent-diffusion).
--- a/docs/source/examples/diffusers_for_other.mdx
+++ b/docs/source/examples/diffusers_for_other.mdx
@@ -10,11 +10,23 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

-# Diffusers for other modalities

-Diffusers offers support to other modalities than vision and audio.
-Currently, some examples include:
- [Diffuser](https://diffusion-planning.github.io/) for planning in reinforcement learning (currenlty only inference): [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TmBmlYeKUZSkUZoJqfBmaicVTKx6nN1R?usp=sharing)

-If you are interested in contributing to under-construction examples, you can explore:
- [GeoDiff](https://github.com/MinkaiXu/GeoDiff) for generating 3D configurations of molecule diagrams [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1pLYYWQhdLuv1q-JtEHGZybxp2RBF8gPs?usp=sharing).
\ No newline at end of file
+# Quicktour
+
+Start using Diffusers🧨 quickly!
+To start, use the [`DiffusionPipeline`] for quick inference and sample generations!
+
+```
+pip install diffusers
+```
+
+## Main classes
+
+### Models
+
+### Schedulers
+
+### Pipeliens
+
+
--- a/docs/source/examples/diffusers_for_vision.mdx
+++ b/docs/source/examples/diffusers_for_vision.mdx
-<!--Copyright 2022 The HuggingFace Team. All rights reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-specific language governing permissions and limitations under the License.
-->
-
-# Diffusers for vision
-
-## Direct image generation
-
-#### **Example image generation with PNDM**
-
-```python
-from diffusers import PNDM, UNetModel, PNDMScheduler
-import PIL.Image
-import numpy as np
-import torch
-
-model_id = "fusing/ddim-celeba-hq"
-
-model = UNetModel.from_pretrained(model_id)
-scheduler = PNDMScheduler()
-
-# load model and scheduler
-pndm = PNDM(unet=model, noise_scheduler=scheduler)
-
-# run pipeline in inference (sample random noise and denoise)
-with torch.no_grad():
-    image = pndm()
-
-# process image to PIL
-image_processed = image.cpu().permute(0, 2, 3, 1)
-image_processed = (image_processed + 1.0) / 2
-image_processed = torch.clamp(image_processed, 0.0, 1.0)
-image_processed = image_processed * 255
-image_processed = image_processed.numpy().astype(np.uint8)
-image_pil = PIL.Image.fromarray(image_processed[0])
-
-# save image
-image_pil.save("test.png")
-```
-
-#### **Example 1024x1024 image generation with SDE VE**
-
-See [paper](https://arxiv.org/abs/2011.13456) for more information on SDE VE.
-
-```python
-from diffusers import DiffusionPipeline
-import torch
-import PIL.Image
-import numpy as np
-
-torch.manual_seed(32)
-
-score_sde_sv = DiffusionPipeline.from_pretrained("fusing/ffhq_ncsnpp")
-
-# Note this might take up to 3 minutes on a GPU
-image = score_sde_sv(num_inference_steps=2000)
-
-image = image.permute(0, 2, 3, 1).cpu().numpy()
-image = np.clip(image * 255, 0, 255).astype(np.uint8)
-image_pil = PIL.Image.fromarray(image[0])
-
-# save image
-image_pil.save("test.png")
-```
-#### **Example 32x32 image generation with SDE VP**
-
-See [paper](https://arxiv.org/abs/2011.13456) for more information on SDE VE.
-
-```python
-from diffusers import DiffusionPipeline
-import torch
-import PIL.Image
-import numpy as np
-
-torch.manual_seed(32)
-
-score_sde_sv = DiffusionPipeline.from_pretrained("fusing/cifar10-ddpmpp-deep-vp")
-
-# Note this might take up to 3 minutes on a GPU
-image = score_sde_sv(num_inference_steps=1000)
-
-image = image.permute(0, 2, 3, 1).cpu().numpy()
-image = np.clip(image * 255, 0, 255).astype(np.uint8)
-image_pil = PIL.Image.fromarray(image[0])
-
-# save image
-image_pil.save("test.png")
-```
-
-
-#### **Text to Image generation with Latent Diffusion**
-
-_Note: To use latent diffusion install transformers from [this branch](https://github.com/patil-suraj/transformers/tree/ldm-bert)._
-
-```python
-from diffusers import DiffusionPipeline
-
-ldm = DiffusionPipeline.from_pretrained("fusing/latent-diffusion-text2im-large")
-
-generator = torch.manual_seed(42)
-
-prompt = "A painting of a squirrel eating a burger"
-image = ldm([prompt], generator=generator, eta=0.3, guidance_scale=6.0, num_inference_steps=50)
-
-image_processed = image.cpu().permute(0, 2, 3, 1)
-image_processed = image_processed * 255.0
-image_processed = image_processed.numpy().astype(np.uint8)
-image_pil = PIL.Image.fromarray(image_processed[0])
-
-# save image
-image_pil.save("test.png")
-```
-
-
-## Text to image generation
-
-```python
-import torch
-from diffusers import BDDMPipeline, GradTTSPipeline
-
-torch_device = "cuda"
-
-# load grad tts and bddm pipelines
-grad_tts = GradTTSPipeline.from_pretrained("fusing/grad-tts-libri-tts")
-bddm = BDDMPipeline.from_pretrained("fusing/diffwave-vocoder-ljspeech")
-
-text = "Hello world, I missed you so much."
-
-# generate mel spectograms using text
-mel_spec = grad_tts(text, torch_device=torch_device)
-
-#  generate the speech by passing mel spectograms to BDDMPipeline pipeline
-generator = torch.manual_seed(42)
-audio = bddm(mel_spec, generator, torch_device=torch_device)
-
-# save generated audio
-from scipy.io.wavfile import write as wavwrite
-
-sampling_rate = 22050
-wavwrite("generated_audio.wav", sampling_rate, audio.squeeze().cpu().numpy())
-```
-