dist_checkpointing.rst 1.9 KB
Newer Older
xingjinliang's avatar
xingjinliang committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
dist\_checkpointing package
===========================

A library for saving and loading the distributed checkpoints.
A "distributed checkpoint" can have various underlying formats (current default format is based on Zarr)
but has a distinctive property - the checkpoint saved in one parallel configuration (tensor/pipeline/data parallelism)
can be loaded in a different parallel configuration.

Using the library requires defining sharded state_dict dictionaries with functions from  *mapping* and *optimizer* modules.
Those state dicts can be saved or loaded with a *serialization* module using strategies from *strategies* module.


Subpackages
-----------

.. toctree::
   :maxdepth: 4

   dist_checkpointing.strategies

Submodules
----------

dist\_checkpointing.serialization module
----------------------------------------

.. automodule:: core.dist_checkpointing.serialization
   :members:
   :undoc-members:
   :show-inheritance:

dist\_checkpointing.mapping module
----------------------------------

.. automodule:: core.dist_checkpointing.mapping
   :members:
   :undoc-members:
   :show-inheritance:

dist\_checkpointing.optimizer module
------------------------------------

.. automodule:: core.dist_checkpointing.optimizer
   :members:
   :undoc-members:
   :show-inheritance:

dist\_checkpointing.core module
-------------------------------

.. automodule:: core.dist_checkpointing.core
   :members:
   :undoc-members:
   :show-inheritance:

dist\_checkpointing.dict\_utils module
--------------------------------------

.. automodule:: core.dist_checkpointing.dict_utils
   :members:
   :undoc-members:
   :show-inheritance:


dist\_checkpointing.utils module
--------------------------------

.. automodule:: core.dist_checkpointing.utils
   :members:
   :undoc-members:
   :show-inheritance:

Module contents
---------------

.. automodule:: core.dist_checkpointing
   :members:
   :undoc-members:
   :show-inheritance: