Commit c748c5f4 authored by chenzk's avatar chenzk
Browse files

v1.2.6

parent 6ff004ca
Pipeline #1703 canceled with stage
...@@ -65,7 +65,7 @@ https://www.jianshu.com/p/a42b7d863825 ...@@ -65,7 +65,7 @@ https://www.jianshu.com/p/a42b7d863825
项目中的数据集可从快速下载通道下载: [imagenet-2012](http://113.200.138.88:18080/aidatasets/project-dependency/imagenet-2012) 项目中的数据集可从快速下载通道下载: [imagenet-2012](http://113.200.138.88:18080/aidatasets/project-dependency/imagenet-2012)
项目中已提供用于试验训练的迷你数据集,训练数据目录结构如下,用于正常训练的完整数据集请按此目录结构进行制备: 项目提供用于试验训练的迷你数据集[tiny-imagenet-200](http://113.200.138.88:18080/aidatasets/project-dependency/tiny-imagenet-200.git),下载解压后将名字tiny-imagenet-200改为data,训练数据目录结构如下,用于正常训练的完整数据集请按此目录结构进行制备:
``` ```
data data
| |
...@@ -91,7 +91,8 @@ data ...@@ -91,7 +91,8 @@ data
### 单机多卡 ### 单机多卡
``` ```
cd megatron-deepspeed-vit cd megatron-deepspeed-vit
sh examples/dspvit_1node.sh # sh examples/dspvit_1node_minidata.sh #用于快速试验迷你数据集
sh examples/dspvit_1node.sh # 训练完整imagenet2012
# 训练过程中报:Message: 'is_pipe_partitioned= False',不影响训练,为deepspeed本身bug,如需要屏蔽可参照deepspeed github官网issue进行源码修改来解决。 # 训练过程中报:Message: 'is_pipe_partitioned= False',不影响训练,为deepspeed本身bug,如需要屏蔽可参照deepspeed github官网issue进行源码修改来解决。
``` ```
### 单机单卡 ### 单机单卡
......
# tests
# megatron autogenerated indices
tests/data/*/*npy
tests/tools/openwebtext-1000.jsonl
tmp/
# macOS
.DS_Store
# Byte-compiled / optimized / DLL files
*/__pycache__/
*.py[cod]
*.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask:
instance/
.webassets-cache
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
Pipfile
Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Intellij project settings
.idea/
.iml
# VSCode
.vscode/
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# static files generated from Django application
media
staticfiles
/tags
# tmp files
*.swp
#! /bin/bash
# Runs the "345M" parameter model
DATA_PATH="./data"
CHECKPOINT_PATH="./checkpoint"
DS_CONFIG="./examples/ds_config.json"
MICRO_BATCH_SIZE=1
GLOBAL_BATCH_SIZE=8
deepspeed --num_gpus 4 pretrain_vit.py \
--num-layers 24 \
--hidden-size 1024 \
--num-attention-heads 16 \
--micro-batch-size ${MICRO_BATCH_SIZE} \
--global-batch-size ${GLOBAL_BATCH_SIZE} \
--seq-length 1024 \
--max-position-embeddings 1024 \
--train-iters 5 \
--lr-decay-iters 3 \
--save $CHECKPOINT_PATH \
--load $CHECKPOINT_PATH \
--data-path $DATA_PATH \
--data-impl mmap \
--split 949,50,1 \
--distributed-backend nccl \
--lr 0.00015 \
--min-lr 1.0e-5 \
--lr-decay-style cosine \
--weight-decay 1e-2 \
--clip-grad 1.0 \
--lr-warmup-fraction .01 \
--log-interval 1 \
--save-interval 5 \
--eval-interval 5 \
--eval-iters 5 \
--fp16 \
--padded_vocab_size 224\
--deepspeed \
--deepspeed_config $DS_CONFIG \
# --eval-only True \
# --do_test True \
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment