We support utilizing [Deepspeed](https://github.com/microsoft/DeepSpeed) to reduce memory costs for training large-scale models, e.g. InternImage-H with over 1 billion parameters.
We support utilizing [Deepspeed](https://github.com/microsoft/DeepSpeed) to reduce memory costs for training large-scale models, e.g. InternImage-H with over 1 billion parameters.
To use it, first install the requirements as
To use it, first install the requirements as
...
@@ -232,25 +286,25 @@ Then you could launch the training in a slurm system with 8 GPUs as follows (tin
...
@@ -232,25 +286,25 @@ Then you could launch the training in a slurm system with 8 GPUs as follows (tin
The default zero stage is 1 and it could config via command line args `--zero-stage`.
The default zero stage is 1 and it could config via command line args `--zero-stage`.
@@ -280,7 +334,7 @@ Then, you could use `best.pth` as usual, e.g., `model.load_state_dict(torch.load
...
@@ -280,7 +334,7 @@ Then, you could use `best.pth` as usual, e.g., `model.load_state_dict(torch.load
> Due to the lack of computational resources, the deepspeed training scripts are currently only verified for the first few epochs. Please fire an issue if you have problems for reproducing the whole training.
> Due to the lack of computational resources, the deepspeed training scripts are currently only verified for the first few epochs. Please fire an issue if you have problems for reproducing the whole training.
### Extracting Intermediate Features
## Extracting Intermediate Features
To extract the features of an intermediate layer, you could use `extract_feature.py`.
To extract the features of an intermediate layer, you could use `extract_feature.py`.
...
@@ -290,7 +344,7 @@ For example, extract features of `b.png` from layers `patch_embed` and `levels.0
...
@@ -290,7 +344,7 @@ For example, extract features of `b.png` from layers `patch_embed` and `levels.0