Docs (#45)

* first pass at docs structure * minor reformatting, add github actions for docs * populate docs (primarily from README, some writing)

Docs (#45)
* first pass at docs structure * minor reformatting, add github actions for docs * populate docs (primarily from README, some writing)
c3d78cd3 · Nathan Lambert · GitHub · 2a69c0b7 · c3d78cd3 · c3d78cd3
Unverified Commit c3d78cd3 authored Jul 13, 2022 by Nathan Lambert Committed by GitHub Jul 13, 2022
16 changed files
--- a/.github/workflows/build_documentation.yml
+++ b/.github/workflows/build_documentation.yml
+name: Build documentation
+on:
+  push:
+    branches:
+      - main
+      - doc-builder*
+      - v*-release
+jobs:
+   build:
+    uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
+    with:
+      commit_sha: ${{ github.sha }}
+      package: diffusers
+    secrets:
+      token: ${{ secrets.HUGGINGFACE_PUSH }}
--- a/.github/workflows/build_pr_documentation.yml
+++ b/.github/workflows/build_pr_documentation.yml
+name: Build PR Documentation
+on:
+  pull_request:
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+jobs:
+  build:
+    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
+    with:
+      commit_sha: ${{ github.event.pull_request.head.sha }}
+      pr_number: ${{ github.event.number }}
+      package: diffusers
--- a/.github/workflows/delete_doc_comment.yml
+++ b/.github/workflows/delete_doc_comment.yml
+name: Delete dev documentation
+on:
+  pull_request:
+    types: [ closed ]
+jobs:
+  delete:
+    uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main
+    with:
+      pr_number: ${{ github.event.number }}
+      package: diffusers
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
+- sections:
+  - local: index
+    title: 🧨 Diffusers
+  - local: quicktour
+    title: Quicktour
+  - local: philosophy
+    title: Philosophy
+  title: Get started
+- sections:
+  - sections:
+    - local: examples/diffusers_for_vision
+      title: Diffusers for Vision
+    - local: examples/diffusers_for_audio
+      title: Diffusers for Audio
+    - local: examples/diffusers_for_other
+      title: Diffusers for Other Modalities
+    title: Examples
+  title: Using Diffusers
+- sections:
+  - sections:
+    - local: pipelines
+      title: Pipelines
+    - local: schedulers
+      title: Schedulers
+    - local: models
+      title: Models
+    title: Main Classes
+  - sections:
+    - local: pipelines/glide
+      title: "Glide"
+    title: Pipelines
+  - sections:
+    - local: schedulers/ddpm
+      title: "DDPM"
+    title: Schedulers
+  - sections:
+    - local: models/unet
+      title: "Unet"
+    title: Models
+  title: API
--- a/docs/source/examples/diffusers_for_audio.mdx
+++ b/docs/source/examples/diffusers_for_audio.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Diffusers for audio
\ No newline at end of file
--- a/docs/source/examples/diffusers_for_other.mdx
+++ b/docs/source/examples/diffusers_for_other.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Diffusers for other modalities
+Diffusers offers support to other modalities than vision and audio.
+Currently, some examples include:
+- [Diffuser](https://diffusion-planning.github.io/) for planning in reinforcement learning (currenlty only inference): [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TmBmlYeKUZSkUZoJqfBmaicVTKx6nN1R?usp=sharing)
+If you are interested in contributing to under-construction examples, you can explore:
+- [GeoDiff](https://github.com/MinkaiXu/GeoDiff) for generating 3D configurations of molecule diagrams [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1pLYYWQhdLuv1q-JtEHGZybxp2RBF8gPs?usp=sharing).
\ No newline at end of file
--- a/docs/source/examples/diffusers_for_vision.mdx
+++ b/docs/source/examples/diffusers_for_vision.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Diffusers for vision
+## Direct image generation
+#### **Example image generation with PNDM**
+```python
+from diffusers import PNDM, UNetModel, PNDMScheduler
+import PIL.Image
+import numpy as np
+import torch
+model_id = "fusing/ddim-celeba-hq"
+model = UNetModel.from_pretrained(model_id)
+scheduler = PNDMScheduler()
+# load model and scheduler
+pndm = PNDM(unet=model, noise_scheduler=scheduler)
+# run pipeline in inference (sample random noise and denoise)
+with torch.no_grad():
+    image = pndm()
+# process image to PIL
+image_processed = image.cpu().permute(0, 2, 3, 1)
+image_processed = (image_processed + 1.0) / 2
+image_processed = torch.clamp(image_processed, 0.0, 1.0)
+image_processed = image_processed * 255
+image_processed = image_processed.numpy().astype(np.uint8)
+image_pil = PIL.Image.fromarray(image_processed[0])
+# save image
+image_pil.save("test.png")
+```
+#### **Example 1024x1024 image generation with SDE VE**
+See [paper](https://arxiv.org/abs/2011.13456) for more information on SDE VE.
+```python
+from diffusers import DiffusionPipeline
+import torch
+import PIL.Image
+import numpy as np
+torch.manual_seed(32)
+score_sde_sv = DiffusionPipeline.from_pretrained("fusing/ffhq_ncsnpp")
+# Note this might take up to 3 minutes on a GPU
+image = score_sde_sv(num_inference_steps=2000)
+image = image.permute(0, 2, 3, 1).cpu().numpy()
+image = np.clip(image * 255, 0, 255).astype(np.uint8)
+image_pil = PIL.Image.fromarray(image[0])
+# save image
+image_pil.save("test.png")
+```
+#### **Example 32x32 image generation with SDE VP**
+See [paper](https://arxiv.org/abs/2011.13456) for more information on SDE VE.
+```python
+from diffusers import DiffusionPipeline
+import torch
+import PIL.Image
+import numpy as np
+torch.manual_seed(32)
+score_sde_sv = DiffusionPipeline.from_pretrained("fusing/cifar10-ddpmpp-deep-vp")
+# Note this might take up to 3 minutes on a GPU
+image = score_sde_sv(num_inference_steps=1000)
+image = image.permute(0, 2, 3, 1).cpu().numpy()
+image = np.clip(image * 255, 0, 255).astype(np.uint8)
+image_pil = PIL.Image.fromarray(image[0])
+# save image
+image_pil.save("test.png")
+```
+#### **Text to Image generation with Latent Diffusion**
+_Note: To use latent diffusion install transformers from [this branch](https://github.com/patil-suraj/transformers/tree/ldm-bert)._
+```python
+from diffusers import DiffusionPipeline
+ldm = DiffusionPipeline.from_pretrained("fusing/latent-diffusion-text2im-large")
+generator = torch.manual_seed(42)
+prompt = "A painting of a squirrel eating a burger"
+image = ldm([prompt], generator=generator, eta=0.3, guidance_scale=6.0, num_inference_steps=50)
+image_processed = image.cpu().permute(0, 2, 3, 1)
+image_processed = image_processed  * 255.
+image_processed = image_processed.numpy().astype(np.uint8)
+image_pil = PIL.Image.fromarray(image_processed[0])
+# save image
+image_pil.save("test.png")
+```
+## Text to image generation
+```python
+import torch
+from diffusers import BDDMPipeline, GradTTSPipeline
+torch_device = "cuda"
+# load grad tts and bddm pipelines
+grad_tts = GradTTSPipeline.from_pretrained("fusing/grad-tts-libri-tts")
+bddm = BDDMPipeline.from_pretrained("fusing/diffwave-vocoder-ljspeech")
+text = "Hello world, I missed you so much."
+# generate mel spectograms using text
+mel_spec = grad_tts(text, torch_device=torch_device)
+#  generate the speech by passing mel spectograms to BDDMPipeline pipeline
+generator = torch.manual_seed(42)
+audio = bddm(mel_spec, generator, torch_device=torch_device)
+# save generated audio
+from scipy.io.wavfile import write as wavwrite
+sampling_rate = 22050
+wavwrite("generated_audio.wav", sampling_rate, audio.squeeze().cpu().numpy())
+```
--- a/docs/source/index.mdx
+++ b/docs/source/index.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+<p align="center">
+    <br>
+    <img src="https://raw.githubusercontent.com/huggingface/diffusers/77aadfee6a891ab9fcfb780f87c693f7a5beeb8e/docs/source/imgs/diffusers_library.jpg" width="400"/>
+    <br>
+</p>
+# 🧨 Diffusers
+🤗 Diffusers provides pretrained diffusion models across multiple modalities, such as vision and audio, and serves
+as a modular toolbox for inference and training of diffusion models.
+More precisely, 🤗 Diffusers offers:
+- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)).
+- Various noise schedulers that can be used interchangeably for the prefered speed vs. quality trade-off in inference (see [src/diffusers/schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)).
+- Multiple types of models, such as UNet, that can be used as building blocks in an end-to-end diffusion system (see [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)).
+- Training examples to show how to train the most popular diffusion models (see [examples](https://github.com/huggingface/diffusers/tree/main/examples)).
+# Installation
+Install Diffusers for with PyTorch. Support for other libraries will come in the future
+🤗 Diffusers is tested on Python 3.6+, and PyTorch 1.4.0+.
+## Install with pip
+You should install 🤗 Diffusers in a [virtual environment](https://docs.python.org/3/library/venv.html).
+If you're unfamiliar with Python virtual environments, take a look at this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
+A virtual environment makes it easier to manage different projects, and avoid compatibility issues between dependencies.
+Start by creating a virtual environment in your project directory:
+```bash
+python -m venv .env
+```
+Activate the virtual environment:
+```bash
+source .env/bin/activate
+```
+Now you're ready to install 🤗 Diffusers with the following command:
+```bash
+pip install diffusers
+```
+## Install from source
+Install 🤗 Diffusers from source with the following command:
+```bash
+pip install git+https://github.com/huggingface/diffusers
+```
+This command installs the bleeding edge `main` version rather than the latest `stable` version.
+The `main` version is useful for staying up-to-date with the latest developments.
+For instance, if a bug has been fixed since the last official release but a new release hasn't been rolled out yet.
+However, this means the `main` version may not always be stable.
+We strive to keep the `main` version operational, and most issues are usually resolved within a few hours or a day.
+If you run into a problem, please open an [Issue](https://github.com/huggingface/transformers/issues) so we can fix it even sooner!
+## Editable install
+You will need an editable install if you'd like to:
+* Use the `main` version of the source code.
+* Contribute to 🤗 Diffusers and need to test changes in the code.
+Clone the repository and install 🤗 Diffusers with the following commands:
+```bash
+git clone https://github.com/huggingface/diffusers.git
+cd transformers
+pip install -e .
+```
+These commands will link the folder you cloned the repository to and your Python library paths.
+Python will now look inside the folder you cloned to in addition to the normal library paths.
+For example, if your Python packages are typically installed in `~/anaconda3/envs/main/lib/python3.7/site-packages/`, Python will also search the folder you cloned to: `~/diffusers/`.
+<Tip warning={true}>
+You must keep the `diffusers` folder if you want to keep using the library.
+</Tip>
+Now you can easily update your clone to the latest version of 🤗 Diffusers with the following command:
+```bash
+cd ~/diffusers/
+git pull
+```
+Your Python environment will find the `main` version of 🤗 Diffuers on the next run.
--- a/docs/source/models.mdx
+++ b/docs/source/models.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Models
+Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
+The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
+The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
+## API
+Models should provide the `def forward` function and initialization of the model.
+All saving, loading, and utilities should be in the base ['ModelMixin'] class.
+## Examples
+- The ['UNetModel'] was proposed in [TODO](https://arxiv.org/) and has been used in paper1, paper2, paper3.
+- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the [GLIDE](https://arxiv.org/abs/2112.10741) paper, the ['UNetGradTTS'] model from this [paper](https://arxiv.org/abs/2105.06337) for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this [paper](https://arxiv.org/abs/2112.10752), and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning [paper](https://arxiv.org/abs/2205.09991).
+- TODO: mention VAE / SDE score estimation
\ No newline at end of file
--- a/docs/source/models/unet.mdx
+++ b/docs/source/models/unet.mdx
+# UNet
+The UNet is an example often used in diffusion models.
+It was originally published [here](https://www.google.com).
\ No newline at end of file
--- a/docs/source/philosophy.mdx
+++ b/docs/source/philosophy.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Philosophy
+- Readability and clarity is prefered over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and provide well-commented code that can be read alongside the original paper.
+- Diffusers is **modality independent** and focusses on providing pretrained models and tools to build systems that generate **continous outputs**, *e.g.* vision and audio.
+- Diffusion models and schedulers are provided as consise, elementary building blocks whereas diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of other library, such as text-encoders. Examples for diffusion pipelines are [Glide](https://github.com/openai/glide-text2im) and [Latent Diffusion](https://github.com/CompVis/latent-diffusion).
--- a/docs/source/pipelines.mdx
+++ b/docs/source/pipelines.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Pipelines
+- Pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box
+- Pipelines should stay as close as possible to their original implementation
+- Pipelines can include components of other library, such as text-encoders.
+## API
+TODO(Patrick, Anton, Suraj)
+## Examples
+- DDPM for unconditional image generation in [pipeline_ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddpm.py).
+- DDIM for unconditional image generation in [pipeline_ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddim.py).
+- PNDM for unconditional image generation in [pipeline_pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_pndm.py).
+- Latent diffusion for text to image generation / conditional image generation in [pipeline_latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_latent_diffusion.py).
+- Glide for text to image generation / conditional image generation in [pipeline_glide](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_glide.py).
+- BDDMPipeline for spectrogram-to-sound vocoding in [pipeline_bddm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_bddm.py).
+- Grad-TTS for text to audio generation / conditional audio generation in [pipeline_grad_tts](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_grad_tts.py).
--- a/docs/source/pipelines/glide.mdx
+++ b/docs/source/pipelines/glide.mdx
+# GLIDE MODEL
\ No newline at end of file
--- a/docs/source/quicktour.mdx
+++ b/docs/source/quicktour.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Quicktour
+Start using Diffusers🧨 quickly!
+To start, use the [`DiffusionPipeline`] for quick inference and sample generations!
+```
+pip install diffusers
+```
+## Main classes
+### Models
+### Schedulers
+### Pipeliens
--- a/docs/source/schedulers.mdx
+++ b/docs/source/schedulers.mdx
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Schedulers
+The base class ['SchedulerMixin'] implements low level utilities used by multiple schedulers.
+At a high level:
+- Schedulers are the algorithms to use diffusion models in inference as well as for training. They include the noise schedules and define algorithm-specific diffusion steps.
+- Schedulers can be used interchangable between diffusion models in inference to find the preferred tradef-off between speed and generation quality.
+- Schedulers are available in numpy, but can easily be transformed into PyTorch.
+## API
+- Schedulers should provide one or more `def step(...)` functions that should be called iteratively to unroll the diffusion loop during
+the forward pass.
+- Schedulers should be framework-agonstic, but provide a simple functionality to convert the scheduler into a specific framework, such as PyTorch
+with a `set_format(...)` method.
+## Examples
+- The ['DDPMScheduler'] was proposed in [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) and can be found in [scheduling_ddpm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_ddpm.py).
+An example of how to use this scheduler can be found in [pipeline_ddpm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddpm.py).
+- The ['DDIMScheduler'] was proposed in [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) and can be found in [scheduling_ddim.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_ddim.py). An example of how to use this scheduler can be found in [pipeline_ddim.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddim.py).
+- The ['PNMDScheduler'] was proposed in [Pseudo Numerical Methods for Diffusion Models on Manifolds](https://arxiv.org/abs/2202.09778) and can be found in [scheduling_pndm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_pndm.py). An example of how to use this scheduler can be found in [pipeline_pndm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_pndm.py).
\ No newline at end of file
--- a/docs/source/schedulers/ddpm.mdx
+++ b/docs/source/schedulers/ddpm.mdx
+# DDPM
+DDPM is a scheduler.
\ No newline at end of file