Unverified Commit 8edaf3b7 authored by Bagheera's avatar Bagheera Committed by GitHub
Browse files

7879 - adjust documentation to use naruto dataset, since pokemon is now gated (#7880)



* 7879 - adjust documentation to use naruto dataset, since pokemon is now gated

* replace references to pokemon in docs

* more references to pokemon replaced

* Japanese translation update

---------
Co-authored-by: default avatarbghira <bghira@users.github.com>
parent 23e09156
...@@ -205,7 +205,7 @@ model_pred = unet(noisy_latents, timesteps, None, added_cond_kwargs=added_cond_k ...@@ -205,7 +205,7 @@ model_pred = unet(noisy_latents, timesteps, None, added_cond_kwargs=added_cond_k
Once you’ve made all your changes or you’re okay with the default configuration, you’re ready to launch the training script! 🚀 Once you’ve made all your changes or you’re okay with the default configuration, you’re ready to launch the training script! 🚀
You'll train on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate your own Pokémon, but you can also create and train on your own dataset by following the [Create a dataset for training](create_dataset) guide. Set the environment variable `DATASET_NAME` to the name of the dataset on the Hub or if you're training on your own files, set the environment variable `TRAIN_DIR` to a path to your dataset. You'll train on the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset to generate your own Naruto characters, but you can also create and train on your own dataset by following the [Create a dataset for training](create_dataset) guide. Set the environment variable `DATASET_NAME` to the name of the dataset on the Hub or if you're training on your own files, set the environment variable `TRAIN_DIR` to a path to your dataset.
If you’re training on more than one GPU, add the `--multi_gpu` parameter to the `accelerate launch` command. If you’re training on more than one GPU, add the `--multi_gpu` parameter to the `accelerate launch` command.
...@@ -219,7 +219,7 @@ To monitor training progress with Weights & Biases, add the `--report_to=wandb` ...@@ -219,7 +219,7 @@ To monitor training progress with Weights & Biases, add the `--report_to=wandb`
<hfoption id="prior model"> <hfoption id="prior model">
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_prior.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_prior.py \
--dataset_name=$DATASET_NAME \ --dataset_name=$DATASET_NAME \
...@@ -232,17 +232,17 @@ accelerate launch --mixed_precision="fp16" train_text_to_image_prior.py \ ...@@ -232,17 +232,17 @@ accelerate launch --mixed_precision="fp16" train_text_to_image_prior.py \
--checkpoints_total_limit=3 \ --checkpoints_total_limit=3 \
--lr_scheduler="constant" \ --lr_scheduler="constant" \
--lr_warmup_steps=0 \ --lr_warmup_steps=0 \
--validation_prompts="A robot pokemon, 4k photo" \ --validation_prompts="A robot naruto, 4k photo" \
--report_to="wandb" \ --report_to="wandb" \
--push_to_hub \ --push_to_hub \
--output_dir="kandi2-prior-pokemon-model" --output_dir="kandi2-prior-naruto-model"
``` ```
</hfoption> </hfoption>
<hfoption id="decoder model"> <hfoption id="decoder model">
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_decoder.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_decoder.py \
--dataset_name=$DATASET_NAME \ --dataset_name=$DATASET_NAME \
...@@ -256,10 +256,10 @@ accelerate launch --mixed_precision="fp16" train_text_to_image_decoder.py \ ...@@ -256,10 +256,10 @@ accelerate launch --mixed_precision="fp16" train_text_to_image_decoder.py \
--checkpoints_total_limit=3 \ --checkpoints_total_limit=3 \
--lr_scheduler="constant" \ --lr_scheduler="constant" \
--lr_warmup_steps=0 \ --lr_warmup_steps=0 \
--validation_prompts="A robot pokemon, 4k photo" \ --validation_prompts="A robot naruto, 4k photo" \
--report_to="wandb" \ --report_to="wandb" \
--push_to_hub \ --push_to_hub \
--output_dir="kandi2-decoder-pokemon-model" --output_dir="kandi2-decoder-naruto-model"
``` ```
</hfoption> </hfoption>
...@@ -279,7 +279,7 @@ prior_components = {"prior_" + k: v for k,v in prior_pipeline.components.items() ...@@ -279,7 +279,7 @@ prior_components = {"prior_" + k: v for k,v in prior_pipeline.components.items()
pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", **prior_components, torch_dtype=torch.float16) pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", **prior_components, torch_dtype=torch.float16)
pipe.enable_model_cpu_offload() pipe.enable_model_cpu_offload()
prompt="A robot pokemon, 4k photo" prompt="A robot naruto, 4k photo"
image = pipeline(prompt=prompt, negative_prompt=negative_prompt).images[0] image = pipeline(prompt=prompt, negative_prompt=negative_prompt).images[0]
``` ```
...@@ -299,7 +299,7 @@ import torch ...@@ -299,7 +299,7 @@ import torch
pipeline = AutoPipelineForText2Image.from_pretrained("path/to/saved/model", torch_dtype=torch.float16) pipeline = AutoPipelineForText2Image.from_pretrained("path/to/saved/model", torch_dtype=torch.float16)
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
prompt="A robot pokemon, 4k photo" prompt="A robot naruto, 4k photo"
image = pipeline(prompt=prompt).images[0] image = pipeline(prompt=prompt).images[0]
``` ```
...@@ -313,7 +313,7 @@ unet = UNet2DConditionModel.from_pretrained("path/to/saved/model" + "/checkpoint ...@@ -313,7 +313,7 @@ unet = UNet2DConditionModel.from_pretrained("path/to/saved/model" + "/checkpoint
pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", unet=unet, torch_dtype=torch.float16) pipeline = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", unet=unet, torch_dtype=torch.float16)
pipeline.enable_model_cpu_offload() pipeline.enable_model_cpu_offload()
image = pipeline(prompt="A robot pokemon, 4k photo").images[0] image = pipeline(prompt="A robot naruto, 4k photo").images[0]
``` ```
</hfoption> </hfoption>
......
...@@ -170,7 +170,7 @@ Aside from setting up the LoRA layers, the training script is more or less the s ...@@ -170,7 +170,7 @@ Aside from setting up the LoRA layers, the training script is more or less the s
Once you've made all your changes or you're okay with the default configuration, you're ready to launch the training script! 🚀 Once you've made all your changes or you're okay with the default configuration, you're ready to launch the training script! 🚀
Let's train on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate our own Pokémon. Set the environment variables `MODEL_NAME` and `DATASET_NAME` to the model and dataset respectively. You should also specify where to save the model in `OUTPUT_DIR`, and the name of the model to save to on the Hub with `HUB_MODEL_ID`. The script creates and saves the following files to your repository: Let's train on the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset to generate your own Naruto characters. Set the environment variables `MODEL_NAME` and `DATASET_NAME` to the model and dataset respectively. You should also specify where to save the model in `OUTPUT_DIR`, and the name of the model to save to on the Hub with `HUB_MODEL_ID`. The script creates and saves the following files to your repository:
- saved model checkpoints - saved model checkpoints
- `pytorch_lora_weights.safetensors` (the trained LoRA weights) - `pytorch_lora_weights.safetensors` (the trained LoRA weights)
...@@ -185,9 +185,9 @@ A full training run takes ~5 hours on a 2080 Ti GPU with 11GB of VRAM. ...@@ -185,9 +185,9 @@ A full training run takes ~5 hours on a 2080 Ti GPU with 11GB of VRAM.
```bash ```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5" export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="/sddata/finetune/lora/pokemon" export OUTPUT_DIR="/sddata/finetune/lora/naruto"
export HUB_MODEL_ID="pokemon-lora" export HUB_MODEL_ID="naruto-lora"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
...@@ -208,7 +208,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \ ...@@ -208,7 +208,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
--hub_model_id=${HUB_MODEL_ID} \ --hub_model_id=${HUB_MODEL_ID} \
--report_to=wandb \ --report_to=wandb \
--checkpointing_steps=500 \ --checkpointing_steps=500 \
--validation_prompt="A pokemon with blue eyes." \ --validation_prompt="A naruto with blue eyes." \
--seed=1337 --seed=1337
``` ```
...@@ -220,7 +220,7 @@ import torch ...@@ -220,7 +220,7 @@ import torch
pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda") pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights("path/to/lora/model", weight_name="pytorch_lora_weights.safetensors") pipeline.load_lora_weights("path/to/lora/model", weight_name="pytorch_lora_weights.safetensors")
image = pipeline("A pokemon with blue eyes").images[0] image = pipeline("A naruto with blue eyes").images[0]
``` ```
## Next steps ## Next steps
......
...@@ -176,7 +176,7 @@ If you want to learn more about how the training loop works, check out the [Unde ...@@ -176,7 +176,7 @@ If you want to learn more about how the training loop works, check out the [Unde
Once you’ve made all your changes or you’re okay with the default configuration, you’re ready to launch the training script! 🚀 Once you’ve made all your changes or you’re okay with the default configuration, you’re ready to launch the training script! 🚀
Let’s train on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate your own Pokémon. Set the environment variables `MODEL_NAME` and `DATASET_NAME` to the model and the dataset (either from the Hub or a local path). You should also specify a VAE other than the SDXL VAE (either from the Hub or a local path) with `VAE_NAME` to avoid numerical instabilities. Let’s train on the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset to generate your own Naruto characters. Set the environment variables `MODEL_NAME` and `DATASET_NAME` to the model and the dataset (either from the Hub or a local path). You should also specify a VAE other than the SDXL VAE (either from the Hub or a local path) with `VAE_NAME` to avoid numerical instabilities.
<Tip> <Tip>
...@@ -187,7 +187,7 @@ To monitor training progress with Weights & Biases, add the `--report_to=wandb` ...@@ -187,7 +187,7 @@ To monitor training progress with Weights & Biases, add the `--report_to=wandb`
```bash ```bash
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0" export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix" export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch train_text_to_image_sdxl.py \ accelerate launch train_text_to_image_sdxl.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
...@@ -211,7 +211,7 @@ accelerate launch train_text_to_image_sdxl.py \ ...@@ -211,7 +211,7 @@ accelerate launch train_text_to_image_sdxl.py \
--validation_prompt="a cute Sundar Pichai creature" \ --validation_prompt="a cute Sundar Pichai creature" \
--validation_epochs 5 \ --validation_epochs 5 \
--checkpointing_steps=5000 \ --checkpointing_steps=5000 \
--output_dir="sdxl-pokemon-model" \ --output_dir="sdxl-naruto-model" \
--push_to_hub --push_to_hub
``` ```
...@@ -226,9 +226,9 @@ import torch ...@@ -226,9 +226,9 @@ import torch
pipeline = DiffusionPipeline.from_pretrained("path/to/your/model", torch_dtype=torch.float16).to("cuda") pipeline = DiffusionPipeline.from_pretrained("path/to/your/model", torch_dtype=torch.float16).to("cuda")
prompt = "A pokemon with green eyes and red legs." prompt = "A naruto with green eyes and red legs."
image = pipeline(prompt, num_inference_steps=30, guidance_scale=7.5).images[0] image = pipeline(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("pokemon.png") image.save("naruto.png")
``` ```
</hfoption> </hfoption>
...@@ -244,11 +244,11 @@ import torch_xla.core.xla_model as xm ...@@ -244,11 +244,11 @@ import torch_xla.core.xla_model as xm
device = xm.xla_device() device = xm.xla_device()
pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0").to(device) pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0").to(device)
prompt = "A pokemon with green eyes and red legs." prompt = "A naruto with green eyes and red legs."
start = time() start = time()
image = pipeline(prompt, num_inference_steps=inference_steps).images[0] image = pipeline(prompt, num_inference_steps=inference_steps).images[0]
print(f'Compilation time is {time()-start} sec') print(f'Compilation time is {time()-start} sec')
image.save("pokemon.png") image.save("naruto.png")
start = time() start = time()
image = pipeline(prompt, num_inference_steps=inference_steps).images[0] image = pipeline(prompt, num_inference_steps=inference_steps).images[0]
......
...@@ -158,7 +158,7 @@ Once you've made all your changes or you're okay with the default configuration, ...@@ -158,7 +158,7 @@ Once you've made all your changes or you're okay with the default configuration,
<hfoptions id="training-inference"> <hfoptions id="training-inference">
<hfoption id="PyTorch"> <hfoption id="PyTorch">
Let's train on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate your own Pokémon. Set the environment variables `MODEL_NAME` and `dataset_name` to the model and the dataset (either from the Hub or a local path). If you're training on more than one GPU, add the `--multi_gpu` parameter to the `accelerate launch` command. Let's train on the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset to generate your own Naruto characters. Set the environment variables `MODEL_NAME` and `dataset_name` to the model and the dataset (either from the Hub or a local path). If you're training on more than one GPU, add the `--multi_gpu` parameter to the `accelerate launch` command.
<Tip> <Tip>
...@@ -168,7 +168,7 @@ To train on a local dataset, set the `TRAIN_DIR` and `OUTPUT_DIR` environment va ...@@ -168,7 +168,7 @@ To train on a local dataset, set the `TRAIN_DIR` and `OUTPUT_DIR` environment va
```bash ```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5" export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export dataset_name="lambdalabs/pokemon-blip-captions" export dataset_name="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image.py \ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
...@@ -183,7 +183,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \ ...@@ -183,7 +183,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--max_grad_norm=1 \ --max_grad_norm=1 \
--enable_xformers_memory_efficient_attention --enable_xformers_memory_efficient_attention
--lr_scheduler="constant" --lr_warmup_steps=0 \ --lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" \ --output_dir="sd-naruto-model" \
--push_to_hub --push_to_hub
``` ```
...@@ -202,7 +202,7 @@ To train on a local dataset, set the `TRAIN_DIR` and `OUTPUT_DIR` environment va ...@@ -202,7 +202,7 @@ To train on a local dataset, set the `TRAIN_DIR` and `OUTPUT_DIR` environment va
```bash ```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5" export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export dataset_name="lambdalabs/pokemon-blip-captions" export dataset_name="lambdalabs/naruto-blip-captions"
python train_text_to_image_flax.py \ python train_text_to_image_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
...@@ -212,7 +212,7 @@ python train_text_to_image_flax.py \ ...@@ -212,7 +212,7 @@ python train_text_to_image_flax.py \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--output_dir="sd-pokemon-model" \ --output_dir="sd-naruto-model" \
--push_to_hub --push_to_hub
``` ```
...@@ -231,7 +231,7 @@ import torch ...@@ -231,7 +231,7 @@ import torch
pipeline = StableDiffusionPipeline.from_pretrained("path/to/saved_model", torch_dtype=torch.float16, use_safetensors=True).to("cuda") pipeline = StableDiffusionPipeline.from_pretrained("path/to/saved_model", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
image = pipeline(prompt="yoda").images[0] image = pipeline(prompt="yoda").images[0]
image.save("yoda-pokemon.png") image.save("yoda-naruto.png")
``` ```
</hfoption> </hfoption>
...@@ -246,7 +246,7 @@ from diffusers import FlaxStableDiffusionPipeline ...@@ -246,7 +246,7 @@ from diffusers import FlaxStableDiffusionPipeline
pipeline, params = FlaxStableDiffusionPipeline.from_pretrained("path/to/saved_model", dtype=jax.numpy.bfloat16) pipeline, params = FlaxStableDiffusionPipeline.from_pretrained("path/to/saved_model", dtype=jax.numpy.bfloat16)
prompt = "yoda pokemon" prompt = "yoda naruto"
prng_seed = jax.random.PRNGKey(0) prng_seed = jax.random.PRNGKey(0)
num_inference_steps = 50 num_inference_steps = 50
...@@ -261,7 +261,7 @@ prompt_ids = shard(prompt_ids) ...@@ -261,7 +261,7 @@ prompt_ids = shard(prompt_ids)
images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
image.save("yoda-pokemon.png") image.save("yoda-naruto.png")
``` ```
</hfoption> </hfoption>
......
...@@ -131,7 +131,7 @@ If you want to learn more about how the training loop works, check out the [Unde ...@@ -131,7 +131,7 @@ If you want to learn more about how the training loop works, check out the [Unde
Once you’ve made all your changes or you’re okay with the default configuration, you’re ready to launch the training script! 🚀 Once you’ve made all your changes or you’re okay with the default configuration, you’re ready to launch the training script! 🚀
Set the `DATASET_NAME` environment variable to the dataset name from the Hub. This guide uses the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset, but you can create and train on your own datasets as well (see the [Create a dataset for training](create_dataset) guide). Set the `DATASET_NAME` environment variable to the dataset name from the Hub. This guide uses the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset, but you can create and train on your own datasets as well (see the [Create a dataset for training](create_dataset) guide).
<Tip> <Tip>
...@@ -140,7 +140,7 @@ To monitor training progress with Weights & Biases, add the `--report_to=wandb` ...@@ -140,7 +140,7 @@ To monitor training progress with Weights & Biases, add the `--report_to=wandb`
</Tip> </Tip>
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch train_text_to_image_prior.py \ accelerate launch train_text_to_image_prior.py \
--mixed_precision="fp16" \ --mixed_precision="fp16" \
...@@ -156,10 +156,10 @@ accelerate launch train_text_to_image_prior.py \ ...@@ -156,10 +156,10 @@ accelerate launch train_text_to_image_prior.py \
--checkpoints_total_limit=3 \ --checkpoints_total_limit=3 \
--lr_scheduler="constant" \ --lr_scheduler="constant" \
--lr_warmup_steps=0 \ --lr_warmup_steps=0 \
--validation_prompts="A robot pokemon, 4k photo" \ --validation_prompts="A robot naruto, 4k photo" \
--report_to="wandb" \ --report_to="wandb" \
--push_to_hub \ --push_to_hub \
--output_dir="wuerstchen-prior-pokemon-model" --output_dir="wuerstchen-prior-naruto-model"
``` ```
Once training is complete, you can use your newly trained model for inference! Once training is complete, you can use your newly trained model for inference!
...@@ -171,7 +171,7 @@ from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS ...@@ -171,7 +171,7 @@ from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS
pipeline = AutoPipelineForText2Image.from_pretrained("path/to/saved/model", torch_dtype=torch.float16).to("cuda") pipeline = AutoPipelineForText2Image.from_pretrained("path/to/saved/model", torch_dtype=torch.float16).to("cuda")
caption = "A cute bird pokemon holding a shield" caption = "A cute bird naruto holding a shield"
images = pipeline( images = pipeline(
caption, caption,
width=1024, width=1024,
......
...@@ -49,15 +49,15 @@ huggingface-cli login ...@@ -49,15 +49,15 @@ huggingface-cli login
### 학습[[dreambooth-training]] ### 학습[[dreambooth-training]]
[Pokémon BLIP 캡션](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) 데이터셋으로 [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)를 파인튜닝해 나만의 포켓몬을 생성해 보겠습니다. [Naruto BLIP 캡션](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) 데이터셋으로 [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)를 파인튜닝해 나만의 포켓몬을 생성해 보겠습니다.
시작하려면 `MODEL_NAME``DATASET_NAME` 환경 변수가 설정되어 있는지 확인하십시오. `OUTPUT_DIR``HUB_MODEL_ID` 변수는 선택 사항이며 허브에서 모델을 저장할 위치를 지정합니다. 시작하려면 `MODEL_NAME``DATASET_NAME` 환경 변수가 설정되어 있는지 확인하십시오. `OUTPUT_DIR``HUB_MODEL_ID` 변수는 선택 사항이며 허브에서 모델을 저장할 위치를 지정합니다.
```bash ```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5" export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="/sddata/finetune/lora/pokemon" export OUTPUT_DIR="/sddata/finetune/lora/naruto"
export HUB_MODEL_ID="pokemon-lora" export HUB_MODEL_ID="naruto-lora"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
``` ```
학습을 시작하기 전에 알아야 할 몇 가지 플래그가 있습니다. 학습을 시작하기 전에 알아야 할 몇 가지 플래그가 있습니다.
......
...@@ -73,12 +73,12 @@ xFormers는 Flax에 사용할 수 없습니다. ...@@ -73,12 +73,12 @@ xFormers는 Flax에 사용할 수 없습니다.
<frameworkcontent> <frameworkcontent>
<pt> <pt>
다음과 같이 [Pokémon BLIP 캡션](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) 데이터셋에서 파인튜닝 실행을 위해 [PyTorch 학습 스크립트](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py)를 실행합니다: 다음과 같이 [Naruto BLIP 캡션](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) 데이터셋에서 파인튜닝 실행을 위해 [PyTorch 학습 스크립트](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py)를 실행합니다:
```bash ```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4" export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export dataset_name="lambdalabs/pokemon-blip-captions" export dataset_name="lambdalabs/naruto-blip-captions"
accelerate launch train_text_to_image.py \ accelerate launch train_text_to_image.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
...@@ -93,7 +93,7 @@ accelerate launch train_text_to_image.py \ ...@@ -93,7 +93,7 @@ accelerate launch train_text_to_image.py \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \ --lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" --output_dir="sd-naruto-model"
``` ```
자체 데이터셋으로 파인튜닝하려면 🤗 [Datasets](https://huggingface.co/docs/datasets/index)에서 요구하는 형식에 따라 데이터셋을 준비하세요. [데이터셋을 허브에 업로드](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub)하거나 [파일들이 있는 로컬 폴더를 준비](https ://huggingface.co/docs/datasets/image_dataset#imagefolder)할 수 있습니다. 자체 데이터셋으로 파인튜닝하려면 🤗 [Datasets](https://huggingface.co/docs/datasets/index)에서 요구하는 형식에 따라 데이터셋을 준비하세요. [데이터셋을 허브에 업로드](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub)하거나 [파일들이 있는 로컬 폴더를 준비](https ://huggingface.co/docs/datasets/image_dataset#imagefolder)할 수 있습니다.
...@@ -136,7 +136,7 @@ pip install -U -r requirements_flax.txt ...@@ -136,7 +136,7 @@ pip install -U -r requirements_flax.txt
```bash ```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5" export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export dataset_name="lambdalabs/pokemon-blip-captions" export dataset_name="lambdalabs/naruto-blip-captions"
python train_text_to_image_flax.py \ python train_text_to_image_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
...@@ -146,7 +146,7 @@ python train_text_to_image_flax.py \ ...@@ -146,7 +146,7 @@ python train_text_to_image_flax.py \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--output_dir="sd-pokemon-model" --output_dir="sd-naruto-model"
``` ```
자체 데이터셋으로 파인튜닝하려면 🤗 [Datasets](https://huggingface.co/docs/datasets/index)에서 요구하는 형식에 따라 데이터셋을 준비하세요. [데이터셋을 허브에 업로드](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub)하거나 [파일들이 있는 로컬 폴더를 준비](https ://huggingface.co/docs/datasets/image_dataset#imagefolder)할 수 있습니다. 자체 데이터셋으로 파인튜닝하려면 🤗 [Datasets](https://huggingface.co/docs/datasets/index)에서 요구하는 형식에 따라 데이터셋을 준비하세요. [데이터셋을 허브에 업로드](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub)하거나 [파일들이 있는 로컬 폴더를 준비](https ://huggingface.co/docs/datasets/image_dataset#imagefolder)할 수 있습니다.
...@@ -166,7 +166,7 @@ python train_text_to_image_flax.py \ ...@@ -166,7 +166,7 @@ python train_text_to_image_flax.py \
--max_train_steps=15000 \ --max_train_steps=15000 \
--learning_rate=1e-05 \ --learning_rate=1e-05 \
--max_grad_norm=1 \ --max_grad_norm=1 \
--output_dir="sd-pokemon-model" --output_dir="sd-naruto-model"
``` ```
</jax> </jax>
</frameworkcontent> </frameworkcontent>
...@@ -189,7 +189,7 @@ pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.flo ...@@ -189,7 +189,7 @@ pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.flo
pipe.to("cuda") pipe.to("cuda")
image = pipe(prompt="yoda").images[0] image = pipe(prompt="yoda").images[0]
image.save("yoda-pokemon.png") image.save("yoda-naruto.png")
``` ```
</pt> </pt>
<jax> <jax>
...@@ -203,7 +203,7 @@ from diffusers import FlaxStableDiffusionPipeline ...@@ -203,7 +203,7 @@ from diffusers import FlaxStableDiffusionPipeline
model_path = "path_to_saved_model" model_path = "path_to_saved_model"
pipe, params = FlaxStableDiffusionPipeline.from_pretrained(model_path, dtype=jax.numpy.bfloat16) pipe, params = FlaxStableDiffusionPipeline.from_pretrained(model_path, dtype=jax.numpy.bfloat16)
prompt = "yoda pokemon" prompt = "yoda naruto"
prng_seed = jax.random.PRNGKey(0) prng_seed = jax.random.PRNGKey(0)
num_inference_steps = 50 num_inference_steps = 50
...@@ -218,7 +218,7 @@ prompt_ids = shard(prompt_ids) ...@@ -218,7 +218,7 @@ prompt_ids = shard(prompt_ids)
images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
image.save("yoda-pokemon.png") image.save("yoda-naruto.png")
``` ```
</jax> </jax>
</frameworkcontent> </frameworkcontent>
\ No newline at end of file
...@@ -103,13 +103,13 @@ accelerate launch train_unconditional.py \ ...@@ -103,13 +103,13 @@ accelerate launch train_unconditional.py \
<div class="flex justify-center"> <div class="flex justify-center">
<img src="https://user-images.githubusercontent.com/26864830/180248660-a0b143d0-b89a-42c5-8656-2ebf6ece7e52.png"/> <img src="https://user-images.githubusercontent.com/26864830/180248660-a0b143d0-b89a-42c5-8656-2ebf6ece7e52.png"/>
</div> </div>
[Pokemon](https://huggingface.co/datasets/huggan/pokemon) 데이터셋을 사용할 경우: [Naruto](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) 데이터셋을 사용할 경우:
```bash ```bash
accelerate launch train_unconditional.py \ accelerate launch train_unconditional.py \
--dataset_name="huggan/pokemon" \ --dataset_name="lambdalabs/naruto-blip-captions" \
--resolution=64 \ --resolution=64 \
--output_dir="ddpm-ema-pokemon-64" \ --output_dir="ddpm-ema-naruto-64" \
--train_batch_size=16 \ --train_batch_size=16 \
--num_epochs=100 \ --num_epochs=100 \
--gradient_accumulation_steps=1 \ --gradient_accumulation_steps=1 \
...@@ -129,9 +129,9 @@ accelerate launch train_unconditional.py \ ...@@ -129,9 +129,9 @@ accelerate launch train_unconditional.py \
```bash ```bash
accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \ accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \
--dataset_name="huggan/pokemon" \ --dataset_name="lambdalabs/naruto-blip-captions" \
--resolution=64 --center_crop --random_flip \ --resolution=64 --center_crop --random_flip \
--output_dir="ddpm-ema-pokemon-64" \ --output_dir="ddpm-ema-naruto-64" \
--train_batch_size=16 \ --train_batch_size=16 \
--num_epochs=100 \ --num_epochs=100 \
--gradient_accumulation_steps=1 \ --gradient_accumulation_steps=1 \
......
...@@ -115,11 +115,11 @@ accelerate launch train_lcm_distill_lora_sdxl_wds.py \ ...@@ -115,11 +115,11 @@ accelerate launch train_lcm_distill_lora_sdxl_wds.py \
We provide another version for LCM LoRA SDXL that follows best practices of `peft` and leverages the `datasets` library for quick experimentation. The script doesn't load two UNets unlike `train_lcm_distill_lora_sdxl_wds.py` which reduces the memory requirements quite a bit. We provide another version for LCM LoRA SDXL that follows best practices of `peft` and leverages the `datasets` library for quick experimentation. The script doesn't load two UNets unlike `train_lcm_distill_lora_sdxl_wds.py` which reduces the memory requirements quite a bit.
Below is an example training command that trains an LCM LoRA on the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions): Below is an example training command that trains an LCM LoRA on the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions):
```bash ```bash
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0" export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix" export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"
accelerate launch train_lcm_distill_lora_sdxl.py \ accelerate launch train_lcm_distill_lora_sdxl.py \
......
...@@ -71,7 +71,7 @@ check_min_version("0.28.0.dev0") ...@@ -71,7 +71,7 @@ check_min_version("0.28.0.dev0")
logger = get_logger(__name__) logger = get_logger(__name__)
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
......
...@@ -57,7 +57,7 @@ To disable wandb logging, remove the `--report_to=="wandb"` and `--validation_pr ...@@ -57,7 +57,7 @@ To disable wandb logging, remove the `--report_to=="wandb"` and `--validation_pr
<!-- accelerate_snippet_start --> <!-- accelerate_snippet_start -->
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_decoder.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_decoder.py \
--dataset_name=$DATASET_NAME \ --dataset_name=$DATASET_NAME \
...@@ -139,7 +139,7 @@ You can fine-tune the Kandinsky prior model with `train_text_to_image_prior.py` ...@@ -139,7 +139,7 @@ You can fine-tune the Kandinsky prior model with `train_text_to_image_prior.py`
<!-- accelerate_snippet_start --> <!-- accelerate_snippet_start -->
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_prior.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_prior.py \
--dataset_name=$DATASET_NAME \ --dataset_name=$DATASET_NAME \
...@@ -183,7 +183,7 @@ If you want to use a fine-tuned decoder checkpoint along with your fine-tuned pr ...@@ -183,7 +183,7 @@ If you want to use a fine-tuned decoder checkpoint along with your fine-tuned pr
for running distributed training with `accelerate`. Here is an example command: for running distributed training with `accelerate`. Here is an example command:
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image_decoder.py \ accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image_decoder.py \
--dataset_name=$DATASET_NAME \ --dataset_name=$DATASET_NAME \
...@@ -227,13 +227,13 @@ on consumer GPUs like Tesla T4, Tesla V100. ...@@ -227,13 +227,13 @@ on consumer GPUs like Tesla T4, Tesla V100.
### Training ### Training
First, you need to set up your development environment as explained in the [installation](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Kandinsky 2.2](https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). First, you need to set up your development environment as explained in the [installation](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Kandinsky 2.2](https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions).
#### Train decoder #### Train decoder
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_decoder_lora.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_decoder_lora.py \
--dataset_name=$DATASET_NAME --caption_column="text" \ --dataset_name=$DATASET_NAME --caption_column="text" \
...@@ -252,7 +252,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image_decoder_lora.py \ ...@@ -252,7 +252,7 @@ accelerate launch --mixed_precision="fp16" train_text_to_image_decoder_lora.py \
#### Train prior #### Train prior
```bash ```bash
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_prior_lora.py \ accelerate launch --mixed_precision="fp16" train_text_to_image_prior_lora.py \
--dataset_name=$DATASET_NAME --caption_column="text" \ --dataset_name=$DATASET_NAME --caption_column="text" \
......
...@@ -332,7 +332,7 @@ def parse_args(): ...@@ -332,7 +332,7 @@ def parse_args():
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
......
...@@ -56,7 +56,7 @@ check_min_version("0.28.0.dev0") ...@@ -56,7 +56,7 @@ check_min_version("0.28.0.dev0")
logger = get_logger(__name__, log_level="INFO") logger = get_logger(__name__, log_level="INFO")
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
......
...@@ -19,7 +19,7 @@ on consumer GPUs like Tesla T4, Tesla V100. ...@@ -19,7 +19,7 @@ on consumer GPUs like Tesla T4, Tesla V100.
### Training ### Training
First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Stable Diffusion v1-4](https://hf.co/CompVis/stable-diffusion-v1-4) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). First, you need to set up your development environment as is explained in the [installation section](#installing-the-dependencies). Make sure to set the `MODEL_NAME` and `DATASET_NAME` environment variables. Here, we will use [Stable Diffusion v1-4](https://hf.co/CompVis/stable-diffusion-v1-4) and the [Pokemons dataset](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions).
**___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___** **___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___**
...@@ -27,7 +27,7 @@ First, you need to set up your development environment as is explained in the [i ...@@ -27,7 +27,7 @@ First, you need to set up your development environment as is explained in the [i
```bash ```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4" export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions" export DATASET_NAME="lambdalabs/naruto-blip-captions"
``` ```
For this example we want to directly store the trained LoRA embeddings on the Hub, so For this example we want to directly store the trained LoRA embeddings on the Hub, so
......
...@@ -387,7 +387,7 @@ def parse_args(): ...@@ -387,7 +387,7 @@ def parse_args():
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
......
...@@ -55,7 +55,7 @@ The command to train a DDPM UNetCondition model on the Pokemon dataset with onnx ...@@ -55,7 +55,7 @@ The command to train a DDPM UNetCondition model on the Pokemon dataset with onnx
```bash ```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4" export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export dataset_name="lambdalabs/pokemon-blip-captions" export dataset_name="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image.py \ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--pretrained_model_name_or_path=$MODEL_NAME \ --pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$dataset_name \ --dataset_name=$dataset_name \
......
...@@ -59,7 +59,7 @@ check_min_version("0.17.0.dev0") ...@@ -59,7 +59,7 @@ check_min_version("0.17.0.dev0")
logger = get_logger(__name__, log_level="INFO") logger = get_logger(__name__, log_level="INFO")
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
......
...@@ -61,7 +61,7 @@ check_min_version("0.28.0.dev0") ...@@ -61,7 +61,7 @@ check_min_version("0.28.0.dev0")
logger = get_logger(__name__, log_level="INFO") logger = get_logger(__name__, log_level="INFO")
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
......
...@@ -406,7 +406,7 @@ def parse_args(): ...@@ -406,7 +406,7 @@ def parse_args():
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
......
...@@ -468,7 +468,7 @@ def parse_args(input_args=None): ...@@ -468,7 +468,7 @@ def parse_args(input_args=None):
DATASET_NAME_MAPPING = { DATASET_NAME_MAPPING = {
"lambdalabs/pokemon-blip-captions": ("image", "text"), "lambdalabs/naruto-blip-captions": ("image", "text"),
} }
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment