We support utilizing [Deepspeed](https://github.com/microsoft/DeepSpeed) to reduce memory costs for training large-scale models, e.g. InternImage-H with over 1 billion parameters.
We support utilizing [DeepSpeed](https://github.com/microsoft/DeepSpeed) to reduce memory costs for training large-scale models, e.g. InternImage-H with over 1 billion parameters.
To use it, first install the requirements as
To use it, first install the requirements as
```bash
```bash
...
@@ -286,23 +297,23 @@ Then you could launch the training in a slurm system with 8 GPUs as follows (tin
...
@@ -286,23 +297,23 @@ Then you could launch the training in a slurm system with 8 GPUs as follows (tin
The default zero stage is 1 and it could config via command line args `--zero-stage`.
The default zero stage is 1 and it could config via command line args `--zero-stage`.