Unverified Commit c375903d authored by Tolga Cangöz's avatar Tolga Cangöz Committed by GitHub
Browse files

Errata - Fix typos & improve contributing page (#8572)



* Fix typos & improve contributing page

* `make style && make quality`

* fix typos

* Fix typo

---------
Co-authored-by: default avatarSayak Paul <spsayakpaul@gmail.com>
parent b9d52fca
...@@ -245,7 +245,7 @@ The official training examples are maintained by the Diffusers' core maintainers ...@@ -245,7 +245,7 @@ The official training examples are maintained by the Diffusers' core maintainers
This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models. This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author. If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.
Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the
training examples, it is required to clone the repository: training examples, it is required to clone the repository:
```bash ```bash
...@@ -255,7 +255,8 @@ git clone https://github.com/huggingface/diffusers ...@@ -255,7 +255,8 @@ git clone https://github.com/huggingface/diffusers
as well as to install all additional dependencies required for training: as well as to install all additional dependencies required for training:
```bash ```bash
pip install -r /examples/<your-example-folder>/requirements.txt cd diffusers
pip install -r examples/<your-example-folder>/requirements.txt
``` ```
Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt). Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
......
...@@ -22,14 +22,13 @@ We enormously value feedback from the community, so please do not be afraid to s ...@@ -22,14 +22,13 @@ We enormously value feedback from the community, so please do not be afraid to s
## Overview ## Overview
You can contribute in many ways ranging from answering questions on issues to adding new diffusion models to You can contribute in many ways ranging from answering questions on issues and discussions to adding new diffusion models to the core library.
the core library.
In the following, we give an overview of different ways to contribute, ranked by difficulty in ascending order. All of them are valuable to the community. In the following, we give an overview of different ways to contribute, ranked by difficulty in ascending order. All of them are valuable to the community.
* 1. Asking and answering questions on [the Diffusers discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers) or on [Discord](https://discord.gg/G7tWnz98XR). * 1. Asking and answering questions on [the Diffusers discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers) or on [Discord](https://discord.gg/G7tWnz98XR).
* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose). * 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose) or new discussions on [the GitHub Discussions tab](https://github.com/huggingface/diffusers/discussions/new/choose).
* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues). * 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues) or discussions on [the GitHub Discussions tab](https://github.com/huggingface/diffusers/discussions).
* 4. Fix a simple issue, marked by the "Good first issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). * 4. Fix a simple issue, marked by the "Good first issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
* 5. Contribute to the [documentation](https://github.com/huggingface/diffusers/tree/main/docs/source). * 5. Contribute to the [documentation](https://github.com/huggingface/diffusers/tree/main/docs/source).
* 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples). * 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples).
...@@ -63,7 +62,7 @@ In the same spirit, you are of immense help to the community by answering such q ...@@ -63,7 +62,7 @@ In the same spirit, you are of immense help to the community by answering such q
**Please** keep in mind that the more effort you put into asking or answering a question, the higher **Please** keep in mind that the more effort you put into asking or answering a question, the higher
the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database. the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section. In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formatted/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
**NOTE about channels**: **NOTE about channels**:
[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago. [*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
...@@ -99,7 +98,7 @@ This means in more detail: ...@@ -99,7 +98,7 @@ This means in more detail:
- Format your code. - Format your code.
- Do not include any external libraries except for Diffusers depending on them. - Do not include any external libraries except for Diffusers depending on them.
- **Always** provide all necessary information about your environment; for this, you can run: `diffusers-cli env` in your shell and copy-paste the displayed information to the issue. - **Always** provide all necessary information about your environment; for this, you can run: `diffusers-cli env` in your shell and copy-paste the displayed information to the issue.
- Explain the issue. If the reader doesn't know what the issue is and why it is an issue, she cannot solve it. - Explain the issue. If the reader doesn't know what the issue is and why it is an issue, (s)he cannot solve it.
- **Always** make sure the reader can reproduce your issue with as little effort as possible. If your code snippet cannot be run because of missing libraries or undefined variables, the reader cannot help you. Make sure your reproducible code snippet is as minimal as possible and can be copy-pasted into a simple Python shell. - **Always** make sure the reader can reproduce your issue with as little effort as possible. If your code snippet cannot be run because of missing libraries or undefined variables, the reader cannot help you. Make sure your reproducible code snippet is as minimal as possible and can be copy-pasted into a simple Python shell.
- If in order to reproduce your issue a model and/or dataset is required, make sure the reader has access to that model or dataset. You can always upload your model or dataset to the [Hub](https://huggingface.co) to make it easily downloadable. Try to keep your model and dataset as small as possible, to make the reproduction of your issue as effortless as possible. - If in order to reproduce your issue a model and/or dataset is required, make sure the reader has access to that model or dataset. You can always upload your model or dataset to the [Hub](https://huggingface.co) to make it easily downloadable. Try to keep your model and dataset as small as possible, to make the reproduction of your issue as effortless as possible.
...@@ -288,7 +287,7 @@ The official training examples are maintained by the Diffusers' core maintainers ...@@ -288,7 +287,7 @@ The official training examples are maintained by the Diffusers' core maintainers
This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models. This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author. If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.
Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the
training examples, it is required to clone the repository: training examples, it is required to clone the repository:
```bash ```bash
...@@ -298,7 +297,8 @@ git clone https://github.com/huggingface/diffusers ...@@ -298,7 +297,8 @@ git clone https://github.com/huggingface/diffusers
as well as to install all additional dependencies required for training: as well as to install all additional dependencies required for training:
```bash ```bash
pip install -r /examples/<your-example-folder>/requirements.txt cd diffusers
pip install -r examples/<your-example-folder>/requirements.txt
``` ```
Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt). Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
...@@ -316,7 +316,7 @@ Once an example script works, please make sure to add a comprehensive `README.md ...@@ -316,7 +316,7 @@ Once an example script works, please make sure to add a comprehensive `README.md
- A link to some training results (logs, models, etc.) that show what the user can expect as shown [here](https://api.wandb.ai/report/patrickvonplaten/xm6cd5q5). - A link to some training results (logs, models, etc.) that show what the user can expect as shown [here](https://api.wandb.ai/report/patrickvonplaten/xm6cd5q5).
- If you are adding a non-official/research training example, **please don't forget** to add a sentence that you are maintaining this training example which includes your git handle as shown [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/intel_opts#diffusers-examples-with-intel-optimizations). - If you are adding a non-official/research training example, **please don't forget** to add a sentence that you are maintaining this training example which includes your git handle as shown [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/intel_opts#diffusers-examples-with-intel-optimizations).
If you are contributing to the official training examples, please also make sure to add a test to [examples/test_examples.py](https://github.com/huggingface/diffusers/blob/main/examples/test_examples.py). This is not necessary for non-official training examples. If you are contributing to the official training examples, please also make sure to add a test to its folder such as [examples/dreambooth/test_dreambooth.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/test_dreambooth.py). This is not necessary for non-official training examples.
### 8. Fixing a "Good second issue" ### 8. Fixing a "Good second issue"
...@@ -418,7 +418,7 @@ You will need basic `git` proficiency to be able to contribute to ...@@ -418,7 +418,7 @@ You will need basic `git` proficiency to be able to contribute to
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference. Git](https://git-scm.com/book/en/v2) is a very good reference.
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/main/setup.py#L244)): Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/83bc6c94eaeb6f7704a2a428931cf2d9ad973ae9/setup.py#L270)):
1. Fork the [repository](https://github.com/huggingface/diffusers) by 1. Fork the [repository](https://github.com/huggingface/diffusers) by
clicking on the 'Fork' button on the repository's page. This creates a copy of the code clicking on the 'Fork' button on the repository's page. This creates a copy of the code
......
...@@ -114,7 +114,7 @@ Now we'll simply specify the name of the dataset and caption column (in this cas ...@@ -114,7 +114,7 @@ Now we'll simply specify the name of the dataset and caption column (in this cas
``` ```
You can also load a dataset straight from by specifying it's name in `dataset_name`. You can also load a dataset straight from by specifying it's name in `dataset_name`.
Look [here](https://huggingface.co/blog/sdxl_lora_advanced_script#custom-captioning) for more info on creating/loadin your own caption dataset. Look [here](https://huggingface.co/blog/sdxl_lora_advanced_script#custom-captioning) for more info on creating/loading your own caption dataset.
- **optimizer**: for this example, we'll use [prodigy](https://huggingface.co/blog/sdxl_lora_advanced_script#adaptive-optimizers) - an adaptive optimizer - **optimizer**: for this example, we'll use [prodigy](https://huggingface.co/blog/sdxl_lora_advanced_script#adaptive-optimizers) - an adaptive optimizer
- **pivotal tuning** - **pivotal tuning**
...@@ -393,7 +393,7 @@ The advanced script now supports custom choice of U-net blocks to train during D ...@@ -393,7 +393,7 @@ The advanced script now supports custom choice of U-net blocks to train during D
> In light of this, we're introducing a new feature to the advanced script to allow for configurable U-net learned blocks. > In light of this, we're introducing a new feature to the advanced script to allow for configurable U-net learned blocks.
**Usage** **Usage**
Configure LoRA learned U-net blocks adding a `lora_unet_blocks` flag, with a comma seperated string specifying the targeted blocks. Configure LoRA learned U-net blocks adding a `lora_unet_blocks` flag, with a comma separated string specifying the targeted blocks.
e.g: e.g:
```bash ```bash
--lora_unet_blocks="unet.up_blocks.0.attentions.0,unet.up_blocks.0.attentions.1" --lora_unet_blocks="unet.up_blocks.0.attentions.0,unet.up_blocks.0.attentions.1"
...@@ -436,7 +436,7 @@ lora_path = "lora-library/B-LoRA-pen_sketch" ...@@ -436,7 +436,7 @@ lora_path = "lora-library/B-LoRA-pen_sketch"
state_dict = lora_lora_unet_blocks(content_B_lora_path,alpha=1,target_blocks=["unet.up_blocks.0.attentions.0"]) state_dict = lora_lora_unet_blocks(content_B_lora_path,alpha=1,target_blocks=["unet.up_blocks.0.attentions.0"])
# Load traine dlora layers into the unet # Load trained lora layers into the unet
pipeline.load_lora_into_unet(state_dict, None, pipeline.unet) pipeline.load_lora_into_unet(state_dict, None, pipeline.unet)
#generate #generate
......
...@@ -326,7 +326,7 @@ def parse_args(input_args=None): ...@@ -326,7 +326,7 @@ def parse_args(input_args=None):
type=str, type=str,
default="TOK", default="TOK",
help="identifier specifying the instance(or instances) as used in instance_prompt, validation prompt, " help="identifier specifying the instance(or instances) as used in instance_prompt, validation prompt, "
"captions - e.g. TOK. To use multiple identifiers, please specify them in a comma seperated string - e.g. " "captions - e.g. TOK. To use multiple identifiers, please specify them in a comma separated string - e.g. "
"'TOK,TOK2,TOK3' etc.", "'TOK,TOK2,TOK3' etc.",
) )
...@@ -559,7 +559,7 @@ def parse_args(input_args=None): ...@@ -559,7 +559,7 @@ def parse_args(input_args=None):
"--prodigy_beta3", "--prodigy_beta3",
type=float, type=float,
default=None, default=None,
help="coefficients for computing the Prodidy stepsize using running averages. If set to None, " help="coefficients for computing the Prodigy stepsize using running averages. If set to None, "
"uses the value of square root of beta2. Ignored if optimizer is adamW", "uses the value of square root of beta2. Ignored if optimizer is adamW",
) )
parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay") parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay")
...@@ -736,7 +736,7 @@ class TokenEmbeddingsHandler: ...@@ -736,7 +736,7 @@ class TokenEmbeddingsHandler:
# random initialization of new tokens # random initialization of new tokens
std_token_embedding = text_encoder.text_model.embeddings.token_embedding.weight.data.std() std_token_embedding = text_encoder.text_model.embeddings.token_embedding.weight.data.std()
print(f"{idx} text encodedr's std_token_embedding: {std_token_embedding}") print(f"{idx} text encoder's std_token_embedding: {std_token_embedding}")
text_encoder.text_model.embeddings.token_embedding.weight.data[self.train_ids] = ( text_encoder.text_model.embeddings.token_embedding.weight.data[self.train_ids] = (
torch.randn(len(self.train_ids), text_encoder.text_model.config.hidden_size) torch.randn(len(self.train_ids), text_encoder.text_model.config.hidden_size)
...@@ -948,7 +948,7 @@ class DreamBoothDataset(Dataset): ...@@ -948,7 +948,7 @@ class DreamBoothDataset(Dataset):
else: else:
example["instance_prompt"] = self.instance_prompt example["instance_prompt"] = self.instance_prompt
else: # costum prompts were provided, but length does not match size of image dataset else: # custom prompts were provided, but length does not match size of image dataset
example["instance_prompt"] = self.instance_prompt example["instance_prompt"] = self.instance_prompt
if self.class_data_root: if self.class_data_root:
...@@ -1967,7 +1967,7 @@ def main(args): ...@@ -1967,7 +1967,7 @@ def main(args):
} }
) )
# Conver to WebUI format # Convert to WebUI format
lora_state_dict = load_file(f"{args.output_dir}/pytorch_lora_weights.safetensors") lora_state_dict = load_file(f"{args.output_dir}/pytorch_lora_weights.safetensors")
peft_state_dict = convert_all_state_dict_to_peft(lora_state_dict) peft_state_dict = convert_all_state_dict_to_peft(lora_state_dict)
kohya_state_dict = convert_state_dict_to_kohya(peft_state_dict) kohya_state_dict = convert_state_dict_to_kohya(peft_state_dict)
......
...@@ -348,7 +348,7 @@ def parse_args(input_args=None): ...@@ -348,7 +348,7 @@ def parse_args(input_args=None):
type=str, type=str,
default="TOK", default="TOK",
help="identifier specifying the instance(or instances) as used in instance_prompt, validation prompt, " help="identifier specifying the instance(or instances) as used in instance_prompt, validation prompt, "
"captions - e.g. TOK. To use multiple identifiers, please specify them in a comma seperated string - e.g. " "captions - e.g. TOK. To use multiple identifiers, please specify them in a comma separated string - e.g. "
"'TOK,TOK2,TOK3' etc.", "'TOK,TOK2,TOK3' etc.",
) )
...@@ -591,7 +591,7 @@ def parse_args(input_args=None): ...@@ -591,7 +591,7 @@ def parse_args(input_args=None):
"--prodigy_beta3", "--prodigy_beta3",
type=float, type=float,
default=None, default=None,
help="coefficients for computing the Prodidy stepsize using running averages. If set to None, " help="coefficients for computing the Prodigy stepsize using running averages. If set to None, "
"uses the value of square root of beta2. Ignored if optimizer is adamW", "uses the value of square root of beta2. Ignored if optimizer is adamW",
) )
parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay") parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay")
...@@ -824,7 +824,7 @@ class TokenEmbeddingsHandler: ...@@ -824,7 +824,7 @@ class TokenEmbeddingsHandler:
# random initialization of new tokens # random initialization of new tokens
std_token_embedding = text_encoder.text_model.embeddings.token_embedding.weight.data.std() std_token_embedding = text_encoder.text_model.embeddings.token_embedding.weight.data.std()
print(f"{idx} text encodedr's std_token_embedding: {std_token_embedding}") print(f"{idx} text encoder's std_token_embedding: {std_token_embedding}")
text_encoder.text_model.embeddings.token_embedding.weight.data[self.train_ids] = ( text_encoder.text_model.embeddings.token_embedding.weight.data[self.train_ids] = (
torch.randn(len(self.train_ids), text_encoder.text_model.config.hidden_size) torch.randn(len(self.train_ids), text_encoder.text_model.config.hidden_size)
...@@ -1097,7 +1097,7 @@ class DreamBoothDataset(Dataset): ...@@ -1097,7 +1097,7 @@ class DreamBoothDataset(Dataset):
else: else:
example["instance_prompt"] = self.instance_prompt example["instance_prompt"] = self.instance_prompt
else: # costum prompts were provided, but length does not match size of image dataset else: # custom prompts were provided, but length does not match size of image dataset
example["instance_prompt"] = self.instance_prompt example["instance_prompt"] = self.instance_prompt
if self.class_data_root: if self.class_data_root:
...@@ -1794,7 +1794,7 @@ def main(args): ...@@ -1794,7 +1794,7 @@ def main(args):
if args.with_prior_preservation: if args.with_prior_preservation:
prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0) prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0)
unet_add_text_embeds = torch.cat([unet_add_text_embeds, class_pooled_prompt_embeds], dim=0) unet_add_text_embeds = torch.cat([unet_add_text_embeds, class_pooled_prompt_embeds], dim=0)
# if we're optmizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the # if we're optimizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the
# batch prompts on all training steps # batch prompts on all training steps
else: else:
tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt, add_special_tokens) tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt, add_special_tokens)
...@@ -2411,7 +2411,7 @@ def main(args): ...@@ -2411,7 +2411,7 @@ def main(args):
} }
) )
# Conver to WebUI format # Convert to WebUI format
lora_state_dict = load_file(f"{args.output_dir}/pytorch_lora_weights.safetensors") lora_state_dict = load_file(f"{args.output_dir}/pytorch_lora_weights.safetensors")
peft_state_dict = convert_all_state_dict_to_peft(lora_state_dict) peft_state_dict = convert_all_state_dict_to_peft(lora_state_dict)
kohya_state_dict = convert_state_dict_to_kohya(peft_state_dict) kohya_state_dict = convert_state_dict_to_kohya(peft_state_dict)
......
...@@ -3595,7 +3595,7 @@ This pipeline provides drag-and-drop image editing using stochastic differential ...@@ -3595,7 +3595,7 @@ This pipeline provides drag-and-drop image editing using stochastic differential
![SDE Drag Image](https://github.com/huggingface/diffusers/assets/75928535/bd54f52f-f002-4951-9934-b2a4592771a5) ![SDE Drag Image](https://github.com/huggingface/diffusers/assets/75928535/bd54f52f-f002-4951-9934-b2a4592771a5)
See [paper](https://arxiv.org/abs/2311.01410), [paper page](https://ml-gsai.github.io/SDE-Drag-demo/), [original repo](https://github.com/ML-GSAI/SDE-Drag) for more infomation. See [paper](https://arxiv.org/abs/2311.01410), [paper page](https://ml-gsai.github.io/SDE-Drag-demo/), [original repo](https://github.com/ML-GSAI/SDE-Drag) for more information.
```py ```py
import PIL import PIL
......
...@@ -795,10 +795,10 @@ class DemoFusionSDXLPipeline( ...@@ -795,10 +795,10 @@ class DemoFusionSDXLPipeline(
Control the strength of dilated sampling. For specific impacts, please refer to Appendix C Control the strength of dilated sampling. For specific impacts, please refer to Appendix C
in the DemoFusion paper. in the DemoFusion paper.
cosine_scale_3 (`float`, defaults to 1): cosine_scale_3 (`float`, defaults to 1):
Control the strength of the gaussion filter. For specific impacts, please refer to Appendix C Control the strength of the gaussian filter. For specific impacts, please refer to Appendix C
in the DemoFusion paper. in the DemoFusion paper.
sigma (`float`, defaults to 1): sigma (`float`, defaults to 1):
The standerd value of the gaussian filter. The standard value of the gaussian filter.
show_image (`bool`, defaults to False): show_image (`bool`, defaults to False):
Determine whether to show intermediate results during generation. Determine whether to show intermediate results during generation.
......
...@@ -517,7 +517,7 @@ def parse_args(input_args=None): ...@@ -517,7 +517,7 @@ def parse_args(input_args=None):
"--prodigy_beta3", "--prodigy_beta3",
type=float, type=float,
default=None, default=None,
help="coefficients for computing the Prodidy stepsize using running averages. If set to None, " help="coefficients for computing the Prodigy stepsize using running averages. If set to None, "
"uses the value of square root of beta2. Ignored if optimizer is adamW", "uses the value of square root of beta2. Ignored if optimizer is adamW",
) )
parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay") parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay")
...@@ -788,7 +788,7 @@ class DreamBoothDataset(Dataset): ...@@ -788,7 +788,7 @@ class DreamBoothDataset(Dataset):
else: else:
example["instance_prompt"] = self.instance_prompt example["instance_prompt"] = self.instance_prompt
else: # costum prompts were provided, but length does not match size of image dataset else: # custom prompts were provided, but length does not match size of image dataset
example["instance_prompt"] = self.instance_prompt example["instance_prompt"] = self.instance_prompt
if self.class_data_root: if self.class_data_root:
...@@ -1359,7 +1359,7 @@ def main(args): ...@@ -1359,7 +1359,7 @@ def main(args):
if args.with_prior_preservation: if args.with_prior_preservation:
prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0) prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0)
pooled_prompt_embeds = torch.cat([pooled_prompt_embeds, class_pooled_prompt_embeds], dim=0) pooled_prompt_embeds = torch.cat([pooled_prompt_embeds, class_pooled_prompt_embeds], dim=0)
# if we're optmizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the # if we're optimizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the
# batch prompts on all training steps # batch prompts on all training steps
else: else:
tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt) tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt)
......
...@@ -562,7 +562,7 @@ def parse_args(input_args=None): ...@@ -562,7 +562,7 @@ def parse_args(input_args=None):
"--prodigy_beta3", "--prodigy_beta3",
type=float, type=float,
default=None, default=None,
help="coefficients for computing the Prodidy stepsize using running averages. If set to None, " help="coefficients for computing the Prodigy stepsize using running averages. If set to None, "
"uses the value of square root of beta2. Ignored if optimizer is adamW", "uses the value of square root of beta2. Ignored if optimizer is adamW",
) )
parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay") parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay")
...@@ -861,7 +861,7 @@ class DreamBoothDataset(Dataset): ...@@ -861,7 +861,7 @@ class DreamBoothDataset(Dataset):
else: else:
example["instance_prompt"] = self.instance_prompt example["instance_prompt"] = self.instance_prompt
else: # costum prompts were provided, but length does not match size of image dataset else: # custom prompts were provided, but length does not match size of image dataset
example["instance_prompt"] = self.instance_prompt example["instance_prompt"] = self.instance_prompt
if self.class_data_root: if self.class_data_root:
...@@ -1488,7 +1488,7 @@ def main(args): ...@@ -1488,7 +1488,7 @@ def main(args):
if args.with_prior_preservation: if args.with_prior_preservation:
prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0) prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0)
unet_add_text_embeds = torch.cat([unet_add_text_embeds, class_pooled_prompt_embeds], dim=0) unet_add_text_embeds = torch.cat([unet_add_text_embeds, class_pooled_prompt_embeds], dim=0)
# if we're optmizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the # if we're optimizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the
# batch prompts on all training steps # batch prompts on all training steps
else: else:
tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt) tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt)
......
...@@ -512,7 +512,7 @@ def parse_args(input_args=None): ...@@ -512,7 +512,7 @@ def parse_args(input_args=None):
"--prodigy_beta3", "--prodigy_beta3",
type=float, type=float,
default=None, default=None,
help="coefficients for computing the Prodidy stepsize using running averages. If set to None, " help="coefficients for computing the Prodigy stepsize using running averages. If set to None, "
"uses the value of square root of beta2. Ignored if optimizer is adamW", "uses the value of square root of beta2. Ignored if optimizer is adamW",
) )
parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay") parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay")
...@@ -783,7 +783,7 @@ class DreamBoothDataset(Dataset): ...@@ -783,7 +783,7 @@ class DreamBoothDataset(Dataset):
else: else:
example["instance_prompt"] = self.instance_prompt example["instance_prompt"] = self.instance_prompt
else: # costum prompts were provided, but length does not match size of image dataset else: # custom prompts were provided, but length does not match size of image dataset
example["instance_prompt"] = self.instance_prompt example["instance_prompt"] = self.instance_prompt
if self.class_data_root: if self.class_data_root:
...@@ -1388,7 +1388,7 @@ def main(args): ...@@ -1388,7 +1388,7 @@ def main(args):
if args.with_prior_preservation: if args.with_prior_preservation:
prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0) prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0)
pooled_prompt_embeds = torch.cat([pooled_prompt_embeds, class_pooled_prompt_embeds], dim=0) pooled_prompt_embeds = torch.cat([pooled_prompt_embeds, class_pooled_prompt_embeds], dim=0)
# if we're optmizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the # if we're optimizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the
# batch prompts on all training steps # batch prompts on all training steps
else: else:
tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt) tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt)
......
...@@ -561,7 +561,7 @@ def parse_args(input_args=None): ...@@ -561,7 +561,7 @@ def parse_args(input_args=None):
"--prodigy_beta3", "--prodigy_beta3",
type=float, type=float,
default=None, default=None,
help="coefficients for computing the Prodidy stepsize using running averages. If set to None, " help="coefficients for computing the Prodigy stepsize using running averages. If set to None, "
"uses the value of square root of beta2. Ignored if optimizer is adamW", "uses the value of square root of beta2. Ignored if optimizer is adamW",
) )
parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay") parser.add_argument("--prodigy_decouple", type=bool, default=True, help="Use AdamW style decoupled weight decay")
...@@ -880,7 +880,7 @@ class DreamBoothDataset(Dataset): ...@@ -880,7 +880,7 @@ class DreamBoothDataset(Dataset):
else: else:
example["instance_prompt"] = self.instance_prompt example["instance_prompt"] = self.instance_prompt
else: # costum prompts were provided, but length does not match size of image dataset else: # custom prompts were provided, but length does not match size of image dataset
example["instance_prompt"] = self.instance_prompt example["instance_prompt"] = self.instance_prompt
if self.class_data_root: if self.class_data_root:
...@@ -1561,7 +1561,7 @@ def main(args): ...@@ -1561,7 +1561,7 @@ def main(args):
if args.with_prior_preservation: if args.with_prior_preservation:
prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0) prompt_embeds = torch.cat([prompt_embeds, class_prompt_hidden_states], dim=0)
unet_add_text_embeds = torch.cat([unet_add_text_embeds, class_pooled_prompt_embeds], dim=0) unet_add_text_embeds = torch.cat([unet_add_text_embeds, class_pooled_prompt_embeds], dim=0)
# if we're optmizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the # if we're optimizing the text encoder (both if instance prompt is used for all images or custom prompts) we need to tokenize and encode the
# batch prompts on all training steps # batch prompts on all training steps
else: else:
tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt) tokens_one = tokenize_prompt(tokenizer_one, args.instance_prompt)
......
...@@ -716,7 +716,7 @@ class LegacyConfigMixin(ConfigMixin): ...@@ -716,7 +716,7 @@ class LegacyConfigMixin(ConfigMixin):
@classmethod @classmethod
def from_config(cls, config: Union[FrozenDict, Dict[str, Any]] = None, return_unused_kwargs=False, **kwargs): def from_config(cls, config: Union[FrozenDict, Dict[str, Any]] = None, return_unused_kwargs=False, **kwargs):
# To prevent depedency import problem. # To prevent dependency import problem.
from .models.model_loading_utils import _fetch_remapped_cls_from_config from .models.model_loading_utils import _fetch_remapped_cls_from_config
# resolve remapping # resolve remapping
......
...@@ -54,7 +54,7 @@ class ControlNetOutput(BaseOutput): ...@@ -54,7 +54,7 @@ class ControlNetOutput(BaseOutput):
be of shape `(batch_size, channel * resolution, height //resolution, width // resolution)`. Output can be be of shape `(batch_size, channel * resolution, height //resolution, width // resolution)`. Output can be
used to condition the original UNet's downsampling activations. used to condition the original UNet's downsampling activations.
mid_down_block_re_sample (`torch.Tensor`): mid_down_block_re_sample (`torch.Tensor`):
The activation of the midde block (the lowest sample resolution). Each tensor should be of shape The activation of the middle block (the lowest sample resolution). Each tensor should be of shape
`(batch_size, channel * lowest_resolution, height // lowest_resolution, width // lowest_resolution)`. `(batch_size, channel * lowest_resolution, height // lowest_resolution, width // lowest_resolution)`.
Output can be used to condition the original UNet's middle block activation. Output can be used to condition the original UNet's middle block activation.
""" """
......
...@@ -980,7 +980,7 @@ class GLIGENTextBoundingboxProjection(nn.Module): ...@@ -980,7 +980,7 @@ class GLIGENTextBoundingboxProjection(nn.Module):
objs = self.linears(torch.cat([positive_embeddings, xyxy_embedding], dim=-1)) objs = self.linears(torch.cat([positive_embeddings, xyxy_embedding], dim=-1))
# positionet with text and image infomation # positionet with text and image information
else: else:
phrases_masks = phrases_masks.unsqueeze(-1) phrases_masks = phrases_masks.unsqueeze(-1)
image_masks = image_masks.unsqueeze(-1) image_masks = image_masks.unsqueeze(-1)
...@@ -1252,7 +1252,7 @@ class MultiIPAdapterImageProjection(nn.Module): ...@@ -1252,7 +1252,7 @@ class MultiIPAdapterImageProjection(nn.Module):
if not isinstance(image_embeds, list): if not isinstance(image_embeds, list):
deprecation_message = ( deprecation_message = (
"You have passed a tensor as `image_embeds`.This is deprecated and will be removed in a future release." "You have passed a tensor as `image_embeds`.This is deprecated and will be removed in a future release."
" Please make sure to update your script to pass `image_embeds` as a list of tensors to supress this warning." " Please make sure to update your script to pass `image_embeds` as a list of tensors to suppress this warning."
) )
deprecate("image_embeds not a list", "1.0.0", deprecation_message, standard_warn=False) deprecate("image_embeds not a list", "1.0.0", deprecation_message, standard_warn=False)
image_embeds = [image_embeds.unsqueeze(1)] image_embeds = [image_embeds.unsqueeze(1)]
......
...@@ -1169,7 +1169,7 @@ class LegacyModelMixin(ModelMixin): ...@@ -1169,7 +1169,7 @@ class LegacyModelMixin(ModelMixin):
@classmethod @classmethod
@validate_hf_hub_args @validate_hf_hub_args
def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.PathLike]], **kwargs): def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.PathLike]], **kwargs):
# To prevent depedency import problem. # To prevent dependency import problem.
from .model_loading_utils import _fetch_remapped_cls_from_config from .model_loading_utils import _fetch_remapped_cls_from_config
# Create a copy of the kwargs so that we don't mess with the keyword arguments in the downstream calls. # Create a copy of the kwargs so that we don't mess with the keyword arguments in the downstream calls.
......
...@@ -61,7 +61,7 @@ class Kandinsky3UNet(ModelMixin, ConfigMixin): ...@@ -61,7 +61,7 @@ class Kandinsky3UNet(ModelMixin, ConfigMixin):
): ):
super().__init__() super().__init__()
# TOOD(Yiyi): Give better name and put into config for the following 4 parameters # TODO(Yiyi): Give better name and put into config for the following 4 parameters
expansion_ratio = 4 expansion_ratio = 4
compression_ratio = 2 compression_ratio = 2
add_cross_attention = (False, True, True, True) add_cross_attention = (False, True, True, True)
......
...@@ -164,7 +164,7 @@ class UNetLDMModelTests(ModelTesterMixin, UNetTesterMixin, unittest.TestCase): ...@@ -164,7 +164,7 @@ class UNetLDMModelTests(ModelTesterMixin, UNetTesterMixin, unittest.TestCase):
@require_torch_accelerator @require_torch_accelerator
def test_from_pretrained_accelerate_wont_change_results(self): def test_from_pretrained_accelerate_wont_change_results(self):
# by defautl model loading will use accelerate as `low_cpu_mem_usage=True` # by default model loading will use accelerate as `low_cpu_mem_usage=True`
model_accelerate, _ = UNet2DModel.from_pretrained("fusing/unet-ldm-dummy-update", output_loading_info=True) model_accelerate, _ = UNet2DModel.from_pretrained("fusing/unet-ldm-dummy-update", output_loading_info=True)
model_accelerate.to(torch_device) model_accelerate.to(torch_device)
model_accelerate.eval() model_accelerate.eval()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment