model-checkpointing.rst 803 Bytes
Newer Older
Jeff Rasley's avatar
Jeff Rasley committed
1
2
3
4
5
6
7
Model Checkpointing
===================

DeepSpeed provides routines for checkpointing model state during training.

Loading Training Checkpoints
----------------------------
Shaden Smith's avatar
Shaden Smith committed
8
.. autofunction:: deepspeed.DeepSpeedEngine.load_checkpoint
Jeff Rasley's avatar
Jeff Rasley committed
9
10
11

Saving Training Checkpoints
---------------------------
Shaden Smith's avatar
Shaden Smith committed
12
.. autofunction:: deepspeed.DeepSpeedEngine.save_checkpoint
aiss's avatar
aiss committed
13
14
15
16
17
18
19
20
21
22
23
24


ZeRO Checkpoint fp32 Weights Recovery
-------------------------------------

DeepSpeed provides routines for extracting fp32 weights from the saved ZeRO checkpoint's optimizer states.

.. autofunction:: deepspeed.utils.zero_to_fp32.get_fp32_state_dict_from_zero_checkpoint

.. autofunction:: deepspeed.utils.zero_to_fp32.load_state_dict_from_zero_checkpoint

.. autofunction:: deepspeed.utils.zero_to_fp32.convert_zero_checkpoint_to_fp32_state_dict