Unverified Commit fc8fc400 authored by Stas Bekman's avatar Stas Bekman Committed by GitHub
Browse files

[docs] post-PR merge fix (#15355)



* [docs] post-PR merge fix

* Update docs/source/main_classes/deepspeed.mdx
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
parent 99a27711
...@@ -31,7 +31,7 @@ won't be possible on a single GPU. ...@@ -31,7 +31,7 @@ won't be possible on a single GPU.
🤗 Transformers integrates [DeepSpeed](https://github.com/microsoft/DeepSpeed) via 2 options: 🤗 Transformers integrates [DeepSpeed](https://github.com/microsoft/DeepSpeed) via 2 options:
1. Integration of the core DeepSpeed features via [`Trainer`]. This is everything done for your type 1. Integration of the core DeepSpeed features via [`Trainer`]. This is an everything-done-for-you type
of integration - just supply your custom config file or use our template and you have nothing else to do. Most of of integration - just supply your custom config file or use our template and you have nothing else to do. Most of
this document is focused on this feature. this document is focused on this feature.
2. If you don't use [`Trainer`] and want to use your own Trainer where you integrated DeepSpeed 2. If you don't use [`Trainer`] and want to use your own Trainer where you integrated DeepSpeed
...@@ -604,7 +604,7 @@ The following is an example of configuration for ZeRO stage 2: ...@@ -604,7 +604,7 @@ The following is an example of configuration for ZeRO stage 2:
**Performance tuning:** **Performance tuning:**
- enabling `offload_optimizer` should reduce GPU RAM usage (it requires `"stage": 2`) - enabling `offload_optimizer` should reduce GPU RAM usage (it requires `"stage": 2`)
- `"overlap_comm": true` trade offs increased GPU RAM usage to lower all-reduce latency. `overlap_comm` uses 4.5x - `"overlap_comm": true` trades off increased GPU RAM usage to lower all-reduce latency. `overlap_comm` uses 4.5x
the `allgather_bucket_size` and `reduce_bucket_size` values. So if they are set to 5e8, this requires a 9GB the `allgather_bucket_size` and `reduce_bucket_size` values. So if they are set to 5e8, this requires a 9GB
footprint (`5e8 x 2Bytes x 2 x 4.5`). Therefore, if you have a GPU with 8GB or less RAM, to avoid getting footprint (`5e8 x 2Bytes x 2 x 4.5`). Therefore, if you have a GPU with 8GB or less RAM, to avoid getting
OOM-errors you will need to reduce those parameters to about `2e8`, which would require 3.6GB. You will want to do OOM-errors you will need to reduce those parameters to about `2e8`, which would require 3.6GB. You will want to do
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment