Commit d461441a authored by Rayyyyy's avatar Rayyyyy
Browse files

update multi_node shell and README, add hostfile

parent 8f8cf840
......@@ -178,7 +178,7 @@ python -m pip install -e detectron2
```
## 训练
下载预训练模型 [MAE ViT-Large model ](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_large.pth), 修改 `$Painter_ROOT/train.sh` 中finetune参数地址.
下载预训练模型 [MAE ViT-Large model ](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_large.pth), 修改 `$Painter_ROOT/train.sh` `$Painter_ROOT/single_process.sh` 中finetune参数地址.
### 单机多卡
本项目默认参数是单机4卡 (total_bsz = 1x4x32 = 128), 如需使用其他的卡数, 请修改 train.sh 中对应参数.
......@@ -188,7 +188,7 @@ bash train.sh
### 多机多卡
Tips: 作者使用8个节点, 每个节点8张卡 (total_bsz = 8x8x32 = 2048) 进行的训练;
使用多节点的情况下,需要将使用节点写入hostfile文件, 多节点每个节点一行, 例如: c1xxxxxx slots=4
```bash
bash run_train_multi.sh
```
......
......@@ -18,5 +18,5 @@ export NPROC_PER_NODE=4
# -np 显卡数量
# -x 将变量传递到single_process.sh脚本中
mpirun -np $np --allow-run-as-root --hostfile hostfile --bind-to none -x dist_url -x PYTHON `pwd`/single_process.sh
mpirun -np $np --allow-run-as-root --hostfile hostfile --bind-to none -x dist_url -x PYTHON -x NPROC_PER_NODE `pwd`/single_process.sh
echo "END TIME: $(date)"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment