totheZeROtechnology,butinsteadusestensorparallelismtoscalemodelsthatcan't fit onto a single GPU. This is a
totheZeROtechnology,butinsteadusestensorparallelismtoscalemodelsthatcan't fit onto a single GPU. This is a
work in progress and we will provide the integration once that product is complete.
work in progress and we will provide the integration once that product is complete.
### Memory Requirements
Since Deepspeed ZeRO can offload memory to CPU (and NVMe) the framework provides utils that allow one to tell how much CPU and GPU memory will be needed depending on the number of GPUs being used.
Thenit's a tradeoff of cost vs speed. It'llbecheapertobuy/rentasmallerGPU(orlessGPUssinceyoucanusemultipleGPUswithDeepspeedZeRO.Butthenit'll be slower, so even if you don'tcareabouthowfastsomethingwillbedone,theslowdownhasadirectimpactonthedurationofusingtheGPUandthusbiggercost.Soexperimentandcomparewhichworksthebest.
IfyouhaveenoughGPUmemorymakesuretodisabletheCPU/NVMeoffloadasit'll make everything faster.
For example, let'srepeatthesamefor2GPUs:
```bash
$python-c'from transformers import AutoModel; \
from deepspeed.runtime.zero.stage3 import estimate_zero3_model_states_mem_needs_all_live; \
model = AutoModel.from_pretrained("bigscience/T0_3B"); \