Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ColossalAI
Commits
6642cebd
Unverified
Commit
6642cebd
authored
Dec 26, 2022
by
BlueRum
Committed by
GitHub
Dec 26, 2022
Browse files
[example] Change some training settings for diffusion (#2195)
parent
24586599
Changes
6
Hide whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
30 additions
and
24 deletions
+30
-24
examples/images/diffusion/README.md
examples/images/diffusion/README.md
+5
-3
examples/images/diffusion/configs/train_colossalai.yaml
examples/images/diffusion/configs/train_colossalai.yaml
+6
-3
examples/images/diffusion/configs/train_ddp.yaml
examples/images/diffusion/configs/train_ddp.yaml
+9
-13
examples/images/diffusion/train.sh
examples/images/diffusion/train.sh
+0
-5
examples/images/diffusion/train_colossalai.sh
examples/images/diffusion/train_colossalai.sh
+5
-0
examples/images/diffusion/train_ddp.sh
examples/images/diffusion/train_ddp.sh
+5
-0
No files found.
examples/images/diffusion/README.md
View file @
6642cebd
...
...
@@ -87,14 +87,15 @@ you should the change the `data.file_path` in the `config/train_colossalai.yaml`
## Training
We provide the script
`train.sh`
to run the training task , and two Stategy in
`configs`
:
`train_colossalai.yaml`
and
`train_ddp.yaml`
We provide the script
`train_colossalai.sh`
to run the training task with colossalai,
and can also use
`train_ddp.sh`
to run the training task with ddp to compare.
For example, you can run
the
tr
ain
ing from colossalai by
In
`train_colossalai.sh`
the
m
ain
command is:
```
python main.py --logdir /tmp/ -t -b configs/train_colossalai.yaml
```
-
you can change the
`--logdir`
t
he
save the log information and the last checkpoint
-
you can change the
`--logdir`
t
o decide where to
save the log information and the last checkpoint
.
### Training config
...
...
@@ -155,6 +156,7 @@ optional arguments:
--config CONFIG path to config which constructs model
--ckpt CKPT path to checkpoint of model
--seed SEED the seed (for reproducible sampling)
--use_int8 whether to use quantization method
--precision {full,autocast}
evaluate at this precision
```
...
...
examples/images/diffusion/configs/train_colossalai.yaml
View file @
6642cebd
...
...
@@ -80,19 +80,22 @@ model:
data
:
target
:
main.DataModuleFromConfig
params
:
batch_size
:
64
batch_size
:
128
wrap
:
False
# num_workwers should be 2 * batch_size, and total num less than 1024
# e.g. if use 8 devices, no more than 128
num_workers
:
128
train
:
target
:
ldm.data.base.Txt2ImgIterableBaseDataset
params
:
file_path
:
"
/data/scratch/diffuser/laion_part0/"
file_path
:
# YOUR DATASET_PATH
world_size
:
1
rank
:
0
lightning
:
trainer
:
accelerator
:
'
gpu'
devices
:
4
devices
:
8
log_gpu_memory
:
all
max_epochs
:
2
precision
:
16
...
...
examples/images/diffusion/configs/train_ddp.yaml
View file @
6642cebd
...
...
@@ -80,25 +80,21 @@ model:
data
:
target
:
main.DataModuleFromConfig
params
:
batch_size
:
16
num_workers
:
4
batch_size
:
128
# num_workwers should be 2 * batch_size, and the total num less than 1024
# e.g. if use 8 devices, no more than 128
num_workers
:
128
train
:
target
:
ldm.data.
teyvat.hf_d
ataset
target
:
ldm.data.
base.Txt2ImgIterableBaseD
ataset
params
:
path
:
Fazzie/Teyvat
image_transforms
:
-
target
:
torchvision.transforms.Resize
params
:
size
:
512
-
target
:
torchvision.transforms.RandomCrop
params
:
size
:
512
-
target
:
torchvision.transforms.RandomHorizontalFlip
file_path
:
# YOUR DATAPATH
world_size
:
1
rank
:
0
lightning
:
trainer
:
accelerator
:
'
gpu'
devices
:
2
devices
:
8
log_gpu_memory
:
all
max_epochs
:
2
precision
:
16
...
...
examples/images/diffusion/train.sh
deleted
100755 → 0
View file @
24586599
# HF_DATASETS_OFFLINE=1
# TRANSFORMERS_OFFLINE=1
# DIFFUSERS_OFFLINE=1
python main.py
--logdir
/tmp/
-t
-b
configs/Teyvat/train_colossalai_teyvat.yaml
examples/images/diffusion/train_colossalai.sh
0 → 100755
View file @
6642cebd
HF_DATASETS_OFFLINE
=
1
TRANSFORMERS_OFFLINE
=
1
DIFFUSERS_OFFLINE
=
1
python main.py
--logdir
/tmp
-t
-b
/configs/train_colossalai.yaml
examples/images/diffusion/train_ddp.sh
0 → 100644
View file @
6642cebd
HF_DATASETS_OFFLINE
=
1
TRANSFORMERS_OFFLINE
=
1
DIFFUSERS_OFFLINE
=
1
python main.py
--logdir
/tmp
-t
-b
/configs/train_ddp.yaml
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment