@@ -540,10 +540,13 @@ upscaler to remove the new token from the instance prompt. I.e. if your stage I
For finegrained detail like faces that aren't present in the original training set, we find that full finetuning of the stage II upscaler is better than
LoRA finetuning stage II.
For finegrained detail like faces, we find that lower learning rates work best.
For finegrained detail like faces, we find that lower learning rates along with larger batch sizes work best.
For stage II, we find that lower learning rates are also needed.
We found experimentally that the DDPM scheduler with the default larger number of denoising steps to sometimes work better than the DPM Solver scheduler
used in the training scripts.
### Stage II additional validation images
The stage II validation requires images to upscale, we can download a downsized version of the training set:
...
...
@@ -631,7 +634,8 @@ with a T5 loaded from the original model.
`use_8bit_adam`: Due to the size of the optimizer states, we recommend training the full XL IF model with 8bit adam.
`--learning_rate=1e-7`: For full dreambooth, IF requires very low learning rates. With higher learning rates model quality will degrade.
`--learning_rate=1e-7`: For full dreambooth, IF requires very low learning rates. With higher learning rates model quality will degrade. Note that it is
likely the learning rate can be increased with larger batch sizes.
Using 8bit adam and a batch size of 4, the model can be trained in ~48 GB VRAM.
@@ -574,10 +574,13 @@ upscaler to remove the new token from the instance prompt. I.e. if your stage I
For finegrained detail like faces that aren't present in the original training set, we find that full finetuning of the stage II upscaler is better than
LoRA finetuning stage II.
For finegrained detail like faces, we find that lower learning rates work best.
For finegrained detail like faces, we find that lower learning rates along with larger batch sizes work best.
For stage II, we find that lower learning rates are also needed.
We found experimentally that the DDPM scheduler with the default larger number of denoising steps to sometimes work better than the DPM Solver scheduler
used in the training scripts.
### Stage II additional validation images
The stage II validation requires images to upscale, we can download a downsized version of the training set:
...
...
@@ -665,7 +668,8 @@ with a T5 loaded from the original model.
`use_8bit_adam`: Due to the size of the optimizer states, we recommend training the full XL IF model with 8bit adam.
`--learning_rate=1e-7`: For full dreambooth, IF requires very low learning rates. With higher learning rates model quality will degrade.
`--learning_rate=1e-7`: For full dreambooth, IF requires very low learning rates. With higher learning rates model quality will degrade. Note that it is
likely the learning rate can be increased with larger batch sizes.
Using 8bit adam and a batch size of 4, the model can be trained in ~48 GB VRAM.