Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
stablediffusion_v2.1_pytorch
Commits
4007efdd
Commit
4007efdd
authored
May 12, 2024
by
lijian6
Browse files
Initial commit
parents
Pipeline
#994
canceled with stages
Changes
138
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
603 additions
and
0 deletions
+603
-0
assets/stable-samples/txt2img/768/merged-0006.png
assets/stable-samples/txt2img/768/merged-0006.png
+0
-0
assets/stable-samples/txt2img/merged-0001.png
assets/stable-samples/txt2img/merged-0001.png
+0
-0
assets/stable-samples/txt2img/merged-0003.png
assets/stable-samples/txt2img/merged-0003.png
+0
-0
assets/stable-samples/txt2img/merged-0005.png
assets/stable-samples/txt2img/merged-0005.png
+0
-0
assets/stable-samples/txt2img/merged-0006.png
assets/stable-samples/txt2img/merged-0006.png
+0
-0
assets/stable-samples/txt2img/merged-0007.png
assets/stable-samples/txt2img/merged-0007.png
+0
-0
assets/stable-samples/upscaling/merged-dog.png
assets/stable-samples/upscaling/merged-dog.png
+0
-0
assets/stable-samples/upscaling/sampled-bear-x4.png
assets/stable-samples/upscaling/sampled-bear-x4.png
+0
-0
assets/stable-samples/upscaling/snow-leopard-x4.png
assets/stable-samples/upscaling/snow-leopard-x4.png
+0
-0
checkpoints/checkpoints.txt
checkpoints/checkpoints.txt
+2
-0
configs/karlo/decoder_900M_vit_l.yaml
configs/karlo/decoder_900M_vit_l.yaml
+37
-0
configs/karlo/improved_sr_64_256_1.4B.yaml
configs/karlo/improved_sr_64_256_1.4B.yaml
+27
-0
configs/karlo/prior_1B_vit_l.yaml
configs/karlo/prior_1B_vit_l.yaml
+21
-0
configs/stable-diffusion/intel/v2-inference-bf16.yaml
configs/stable-diffusion/intel/v2-inference-bf16.yaml
+71
-0
configs/stable-diffusion/intel/v2-inference-fp32.yaml
configs/stable-diffusion/intel/v2-inference-fp32.yaml
+70
-0
configs/stable-diffusion/intel/v2-inference-v-bf16.yaml
configs/stable-diffusion/intel/v2-inference-v-bf16.yaml
+72
-0
configs/stable-diffusion/intel/v2-inference-v-fp32.yaml
configs/stable-diffusion/intel/v2-inference-v-fp32.yaml
+71
-0
configs/stable-diffusion/v2-1-stable-unclip-h-inference.yaml
configs/stable-diffusion/v2-1-stable-unclip-h-inference.yaml
+80
-0
configs/stable-diffusion/v2-1-stable-unclip-l-inference.yaml
configs/stable-diffusion/v2-1-stable-unclip-l-inference.yaml
+84
-0
configs/stable-diffusion/v2-inference-v.yaml
configs/stable-diffusion/v2-inference-v.yaml
+68
-0
No files found.
assets/stable-samples/txt2img/768/merged-0006.png
0 → 100644
View file @
4007efdd
4.17 MB
assets/stable-samples/txt2img/merged-0001.png
0 → 100644
View file @
4007efdd
2.3 MB
assets/stable-samples/txt2img/merged-0003.png
0 → 100644
View file @
4007efdd
2.17 MB
assets/stable-samples/txt2img/merged-0005.png
0 → 100644
View file @
4007efdd
2.46 MB
assets/stable-samples/txt2img/merged-0006.png
0 → 100644
View file @
4007efdd
2.52 MB
assets/stable-samples/txt2img/merged-0007.png
0 → 100644
View file @
4007efdd
2.3 MB
assets/stable-samples/upscaling/merged-dog.png
0 → 100644
View file @
4007efdd
1.74 MB
assets/stable-samples/upscaling/sampled-bear-x4.png
0 → 100644
View file @
4007efdd
3.01 MB
assets/stable-samples/upscaling/snow-leopard-x4.png
0 → 100644
View file @
4007efdd
3.71 MB
checkpoints/checkpoints.txt
0 → 100644
View file @
4007efdd
Put unCLIP checkpoints here.
\ No newline at end of file
configs/karlo/decoder_900M_vit_l.yaml
0 → 100644
View file @
4007efdd
model
:
type
:
t2i-decoder
diffusion_sampler
:
uniform
hparams
:
image_size
:
64
num_channels
:
320
num_res_blocks
:
3
channel_mult
:
'
'
attention_resolutions
:
32,16,8
num_heads
:
-1
num_head_channels
:
64
num_heads_upsample
:
-1
use_scale_shift_norm
:
true
dropout
:
0.1
clip_dim
:
768
clip_emb_mult
:
4
text_ctx
:
77
xf_width
:
1536
xf_layers
:
0
xf_heads
:
0
xf_final_ln
:
false
resblock_updown
:
true
learn_sigma
:
true
text_drop
:
0.3
clip_emb_type
:
image
clip_emb_drop
:
0.1
use_plm
:
true
diffusion
:
steps
:
1000
learn_sigma
:
true
sigma_small
:
false
noise_schedule
:
squaredcos_cap_v2
use_kl
:
false
predict_xstart
:
false
rescale_learned_sigmas
:
true
timestep_respacing
:
'
'
configs/karlo/improved_sr_64_256_1.4B.yaml
0 → 100644
View file @
4007efdd
model
:
type
:
improved_sr_64_256
diffusion_sampler
:
uniform
hparams
:
channels
:
320
depth
:
3
channels_multiple
:
-
1
-
2
-
3
-
4
dropout
:
0.0
diffusion
:
steps
:
1000
learn_sigma
:
false
sigma_small
:
true
noise_schedule
:
squaredcos_cap_v2
use_kl
:
false
predict_xstart
:
false
rescale_learned_sigmas
:
true
timestep_respacing
:
'
7'
sampling
:
timestep_respacing
:
'
7'
# fix
clip_denoise
:
true
configs/karlo/prior_1B_vit_l.yaml
0 → 100644
View file @
4007efdd
model
:
type
:
prior
diffusion_sampler
:
uniform
hparams
:
text_ctx
:
77
xf_width
:
2048
xf_layers
:
20
xf_heads
:
32
xf_final_ln
:
true
text_drop
:
0.2
clip_dim
:
768
diffusion
:
steps
:
1000
learn_sigma
:
false
sigma_small
:
true
noise_schedule
:
squaredcos_cap_v2
use_kl
:
false
predict_xstart
:
true
rescale_learned_sigmas
:
false
timestep_respacing
:
'
'
configs/stable-diffusion/intel/v2-inference-bf16.yaml
0 → 100644
View file @
4007efdd
# Copyright (C) 2022 Intel Corporation
# SPDX-License-Identifier: MIT
model
:
base_learning_rate
:
1.0e-4
target
:
ldm.models.diffusion.ddpm.LatentDiffusion
params
:
linear_start
:
0.00085
linear_end
:
0.0120
num_timesteps_cond
:
1
log_every_t
:
200
timesteps
:
1000
first_stage_key
:
"
jpg"
cond_stage_key
:
"
txt"
image_size
:
64
channels
:
4
cond_stage_trainable
:
false
conditioning_key
:
crossattn
monitor
:
val/loss_simple_ema
scale_factor
:
0.18215
use_ema
:
False
# we set this to false because this is an inference only config
unet_config
:
target
:
ldm.modules.diffusionmodules.openaimodel.UNetModel
params
:
use_checkpoint
:
False
use_fp16
:
False
use_bf16
:
True
image_size
:
32
# unused
in_channels
:
4
out_channels
:
4
model_channels
:
320
attention_resolutions
:
[
4
,
2
,
1
]
num_res_blocks
:
2
channel_mult
:
[
1
,
2
,
4
,
4
]
num_head_channels
:
64
# need to fix for flash-attn
use_spatial_transformer
:
True
use_linear_in_transformer
:
True
transformer_depth
:
1
context_dim
:
1024
legacy
:
False
first_stage_config
:
target
:
ldm.models.autoencoder.AutoencoderKL
params
:
embed_dim
:
4
monitor
:
val/rec_loss
ddconfig
:
#attn_type: "vanilla-xformers"
double_z
:
true
z_channels
:
4
resolution
:
256
in_channels
:
3
out_ch
:
3
ch
:
128
ch_mult
:
-
1
-
2
-
4
-
4
num_res_blocks
:
2
attn_resolutions
:
[]
dropout
:
0.0
lossconfig
:
target
:
torch.nn.Identity
cond_stage_config
:
target
:
ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
params
:
freeze
:
True
layer
:
"
penultimate"
configs/stable-diffusion/intel/v2-inference-fp32.yaml
0 → 100644
View file @
4007efdd
# Copyright (C) 2022 Intel Corporation
# SPDX-License-Identifier: MIT
model
:
base_learning_rate
:
1.0e-4
target
:
ldm.models.diffusion.ddpm.LatentDiffusion
params
:
linear_start
:
0.00085
linear_end
:
0.0120
num_timesteps_cond
:
1
log_every_t
:
200
timesteps
:
1000
first_stage_key
:
"
jpg"
cond_stage_key
:
"
txt"
image_size
:
64
channels
:
4
cond_stage_trainable
:
false
conditioning_key
:
crossattn
monitor
:
val/loss_simple_ema
scale_factor
:
0.18215
use_ema
:
False
# we set this to false because this is an inference only config
unet_config
:
target
:
ldm.modules.diffusionmodules.openaimodel.UNetModel
params
:
use_checkpoint
:
False
use_fp16
:
False
image_size
:
32
# unused
in_channels
:
4
out_channels
:
4
model_channels
:
320
attention_resolutions
:
[
4
,
2
,
1
]
num_res_blocks
:
2
channel_mult
:
[
1
,
2
,
4
,
4
]
num_head_channels
:
64
# need to fix for flash-attn
use_spatial_transformer
:
True
use_linear_in_transformer
:
True
transformer_depth
:
1
context_dim
:
1024
legacy
:
False
first_stage_config
:
target
:
ldm.models.autoencoder.AutoencoderKL
params
:
embed_dim
:
4
monitor
:
val/rec_loss
ddconfig
:
#attn_type: "vanilla-xformers"
double_z
:
true
z_channels
:
4
resolution
:
256
in_channels
:
3
out_ch
:
3
ch
:
128
ch_mult
:
-
1
-
2
-
4
-
4
num_res_blocks
:
2
attn_resolutions
:
[]
dropout
:
0.0
lossconfig
:
target
:
torch.nn.Identity
cond_stage_config
:
target
:
ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
params
:
freeze
:
True
layer
:
"
penultimate"
configs/stable-diffusion/intel/v2-inference-v-bf16.yaml
0 → 100644
View file @
4007efdd
# Copyright (C) 2022 Intel Corporation
# SPDX-License-Identifier: MIT
model
:
base_learning_rate
:
1.0e-4
target
:
ldm.models.diffusion.ddpm.LatentDiffusion
params
:
parameterization
:
"
v"
linear_start
:
0.00085
linear_end
:
0.0120
num_timesteps_cond
:
1
log_every_t
:
200
timesteps
:
1000
first_stage_key
:
"
jpg"
cond_stage_key
:
"
txt"
image_size
:
64
channels
:
4
cond_stage_trainable
:
false
conditioning_key
:
crossattn
monitor
:
val/loss_simple_ema
scale_factor
:
0.18215
use_ema
:
False
# we set this to false because this is an inference only config
unet_config
:
target
:
ldm.modules.diffusionmodules.openaimodel.UNetModel
params
:
use_checkpoint
:
False
use_fp16
:
False
use_bf16
:
True
image_size
:
32
# unused
in_channels
:
4
out_channels
:
4
model_channels
:
320
attention_resolutions
:
[
4
,
2
,
1
]
num_res_blocks
:
2
channel_mult
:
[
1
,
2
,
4
,
4
]
num_head_channels
:
64
# need to fix for flash-attn
use_spatial_transformer
:
True
use_linear_in_transformer
:
True
transformer_depth
:
1
context_dim
:
1024
legacy
:
False
first_stage_config
:
target
:
ldm.models.autoencoder.AutoencoderKL
params
:
embed_dim
:
4
monitor
:
val/rec_loss
ddconfig
:
#attn_type: "vanilla-xformers"
double_z
:
true
z_channels
:
4
resolution
:
256
in_channels
:
3
out_ch
:
3
ch
:
128
ch_mult
:
-
1
-
2
-
4
-
4
num_res_blocks
:
2
attn_resolutions
:
[]
dropout
:
0.0
lossconfig
:
target
:
torch.nn.Identity
cond_stage_config
:
target
:
ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
params
:
freeze
:
True
layer
:
"
penultimate"
configs/stable-diffusion/intel/v2-inference-v-fp32.yaml
0 → 100644
View file @
4007efdd
# Copyright (C) 2022 Intel Corporation
# SPDX-License-Identifier: MIT
model
:
base_learning_rate
:
1.0e-4
target
:
ldm.models.diffusion.ddpm.LatentDiffusion
params
:
parameterization
:
"
v"
linear_start
:
0.00085
linear_end
:
0.0120
num_timesteps_cond
:
1
log_every_t
:
200
timesteps
:
1000
first_stage_key
:
"
jpg"
cond_stage_key
:
"
txt"
image_size
:
64
channels
:
4
cond_stage_trainable
:
false
conditioning_key
:
crossattn
monitor
:
val/loss_simple_ema
scale_factor
:
0.18215
use_ema
:
False
# we set this to false because this is an inference only config
unet_config
:
target
:
ldm.modules.diffusionmodules.openaimodel.UNetModel
params
:
use_checkpoint
:
False
use_fp16
:
False
image_size
:
32
# unused
in_channels
:
4
out_channels
:
4
model_channels
:
320
attention_resolutions
:
[
4
,
2
,
1
]
num_res_blocks
:
2
channel_mult
:
[
1
,
2
,
4
,
4
]
num_head_channels
:
64
# need to fix for flash-attn
use_spatial_transformer
:
True
use_linear_in_transformer
:
True
transformer_depth
:
1
context_dim
:
1024
legacy
:
False
first_stage_config
:
target
:
ldm.models.autoencoder.AutoencoderKL
params
:
embed_dim
:
4
monitor
:
val/rec_loss
ddconfig
:
#attn_type: "vanilla-xformers"
double_z
:
true
z_channels
:
4
resolution
:
256
in_channels
:
3
out_ch
:
3
ch
:
128
ch_mult
:
-
1
-
2
-
4
-
4
num_res_blocks
:
2
attn_resolutions
:
[]
dropout
:
0.0
lossconfig
:
target
:
torch.nn.Identity
cond_stage_config
:
target
:
ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
params
:
freeze
:
True
layer
:
"
penultimate"
configs/stable-diffusion/v2-1-stable-unclip-h-inference.yaml
0 → 100644
View file @
4007efdd
model
:
base_learning_rate
:
1.0e-04
target
:
ldm.models.diffusion.ddpm.ImageEmbeddingConditionedLatentDiffusion
params
:
embedding_dropout
:
0.25
parameterization
:
"
v"
linear_start
:
0.00085
linear_end
:
0.0120
log_every_t
:
200
timesteps
:
1000
first_stage_key
:
"
jpg"
cond_stage_key
:
"
txt"
image_size
:
96
channels
:
4
cond_stage_trainable
:
false
conditioning_key
:
crossattn-adm
scale_factor
:
0.18215
monitor
:
val/loss_simple_ema
use_ema
:
False
embedder_config
:
target
:
ldm.modules.encoders.modules.FrozenOpenCLIPImageEmbedder
noise_aug_config
:
target
:
ldm.modules.encoders.modules.CLIPEmbeddingNoiseAugmentation
params
:
timestep_dim
:
1024
noise_schedule_config
:
timesteps
:
1000
beta_schedule
:
squaredcos_cap_v2
unet_config
:
target
:
ldm.modules.diffusionmodules.openaimodel.UNetModel
params
:
num_classes
:
"
sequential"
adm_in_channels
:
2048
use_checkpoint
:
True
image_size
:
32
# unused
in_channels
:
4
out_channels
:
4
model_channels
:
320
attention_resolutions
:
[
4
,
2
,
1
]
num_res_blocks
:
2
channel_mult
:
[
1
,
2
,
4
,
4
]
num_head_channels
:
64
# need to fix for flash-attn
use_spatial_transformer
:
True
use_linear_in_transformer
:
True
transformer_depth
:
1
context_dim
:
1024
legacy
:
False
first_stage_config
:
target
:
ldm.models.autoencoder.AutoencoderKL
params
:
embed_dim
:
4
monitor
:
val/rec_loss
ddconfig
:
attn_type
:
"
vanilla-xformers"
double_z
:
true
z_channels
:
4
resolution
:
256
in_channels
:
3
out_ch
:
3
ch
:
128
ch_mult
:
-
1
-
2
-
4
-
4
num_res_blocks
:
2
attn_resolutions
:
[
]
dropout
:
0.0
lossconfig
:
target
:
torch.nn.Identity
cond_stage_config
:
target
:
ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
params
:
freeze
:
True
layer
:
"
penultimate"
configs/stable-diffusion/v2-1-stable-unclip-l-inference.yaml
0 → 100644
View file @
4007efdd
model
:
base_learning_rate
:
1.0e-04
target
:
ldm.models.diffusion.ddpm.ImageEmbeddingConditionedLatentDiffusion
params
:
embedding_dropout
:
0.25
parameterization
:
"
v"
linear_start
:
0.00085
linear_end
:
0.0120
log_every_t
:
200
timesteps
:
1000
first_stage_key
:
"
jpg"
cond_stage_key
:
"
txt"
image_size
:
96
channels
:
4
cond_stage_trainable
:
false
conditioning_key
:
crossattn-adm
scale_factor
:
0.18215
monitor
:
val/loss_simple_ema
use_ema
:
False
embedder_config
:
target
:
ldm.modules.encoders.modules.ClipImageEmbedder
params
:
model
:
"
ViT-L/14"
noise_aug_config
:
target
:
ldm.modules.encoders.modules.CLIPEmbeddingNoiseAugmentation
params
:
clip_stats_path
:
"
checkpoints/karlo_models/ViT-L-14_stats.th"
timestep_dim
:
768
noise_schedule_config
:
timesteps
:
1000
beta_schedule
:
squaredcos_cap_v2
unet_config
:
target
:
ldm.modules.diffusionmodules.openaimodel.UNetModel
params
:
num_classes
:
"
sequential"
adm_in_channels
:
1536
use_checkpoint
:
True
image_size
:
32
# unused
in_channels
:
4
out_channels
:
4
model_channels
:
320
attention_resolutions
:
[
4
,
2
,
1
]
num_res_blocks
:
2
channel_mult
:
[
1
,
2
,
4
,
4
]
num_head_channels
:
64
# need to fix for flash-attn
use_spatial_transformer
:
True
use_linear_in_transformer
:
True
transformer_depth
:
1
context_dim
:
1024
legacy
:
False
first_stage_config
:
target
:
ldm.models.autoencoder.AutoencoderKL
params
:
embed_dim
:
4
monitor
:
val/rec_loss
ddconfig
:
attn_type
:
"
vanilla-xformers"
double_z
:
true
z_channels
:
4
resolution
:
256
in_channels
:
3
out_ch
:
3
ch
:
128
ch_mult
:
-
1
-
2
-
4
-
4
num_res_blocks
:
2
attn_resolutions
:
[
]
dropout
:
0.0
lossconfig
:
target
:
torch.nn.Identity
cond_stage_config
:
target
:
ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
params
:
freeze
:
True
layer
:
"
penultimate"
\ No newline at end of file
configs/stable-diffusion/v2-inference-v.yaml
0 → 100644
View file @
4007efdd
model
:
base_learning_rate
:
1.0e-4
target
:
ldm.models.diffusion.ddpm.LatentDiffusion
params
:
parameterization
:
"
v"
linear_start
:
0.00085
linear_end
:
0.0120
num_timesteps_cond
:
1
log_every_t
:
200
timesteps
:
1000
first_stage_key
:
"
jpg"
cond_stage_key
:
"
txt"
image_size
:
64
channels
:
4
cond_stage_trainable
:
false
conditioning_key
:
crossattn
monitor
:
val/loss_simple_ema
scale_factor
:
0.18215
use_ema
:
False
# we set this to false because this is an inference only config
unet_config
:
target
:
ldm.modules.diffusionmodules.openaimodel.UNetModel
params
:
use_checkpoint
:
True
use_fp16
:
True
image_size
:
32
# unused
in_channels
:
4
out_channels
:
4
model_channels
:
320
attention_resolutions
:
[
4
,
2
,
1
]
num_res_blocks
:
2
channel_mult
:
[
1
,
2
,
4
,
4
]
num_head_channels
:
64
# need to fix for flash-attn
use_spatial_transformer
:
True
use_linear_in_transformer
:
True
transformer_depth
:
1
context_dim
:
1024
legacy
:
False
first_stage_config
:
target
:
ldm.models.autoencoder.AutoencoderKL
params
:
embed_dim
:
4
monitor
:
val/rec_loss
ddconfig
:
#attn_type: "vanilla-xformers"
double_z
:
true
z_channels
:
4
resolution
:
256
in_channels
:
3
out_ch
:
3
ch
:
128
ch_mult
:
-
1
-
2
-
4
-
4
num_res_blocks
:
2
attn_resolutions
:
[]
dropout
:
0.0
lossconfig
:
target
:
torch.nn.Identity
cond_stage_config
:
target
:
ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
params
:
freeze
:
True
layer
:
"
penultimate"
Prev
1
2
3
4
5
6
7
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment