[doc] hotfixes, old documentation (#232)

Thanks Jessica for the heads up !

[doc] hotfixes, old documentation (#232)
Thanks Jessica for the heads up !
92210136 · Benjamin Lefaudeux · GitHub · 47e57935 · 92210136 · 92210136
Unverified Commit 92210136 authored Dec 04, 2020 by Benjamin Lefaudeux Committed by GitHub Dec 04, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 2 deletions

README.md README.md +4 -1

docs/source/tutorials/oss.rst docs/source/tutorials/oss.rst +3 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -74,14 +74,17 @@ def train(

    # Problem statement
    model = myAwesomeModel().to(rank)
-    model = ShardedDDP(model, device_ids=[rank])  # this will handle the gradient reduce automatically
    dataloader = mySuperFastDataloader()
    loss_fn = myVeryRelevantLoss()
    base_optimizer = torch.optim.SGD # pick any pytorch compliant optimizer here
    base_optimizer_arguments = {} # pass any optimizer specific arguments here, or directly below when instantiating OSS

+    # Wrap the optimizer in its state sharding brethren
    optimizer = OSS(params=model.parameters(), optim=base_optimizer, **base_optimizer_arguments)

+    # Wrap the model into ShardedDDP, which will reduce gradients to the proper ranks
+    model = ShardedDDP(model, optimizer)
+
    # Any relevant training loop, nothing specific to OSS. For example:
    model.train()
    for e in range(epochs):

--- a/docs/source/tutorials/oss.rst
+++ b/docs/source/tutorials/oss.rst
@@ -65,7 +65,6 @@ DDP can be used in place of ShardedDDP in the example below, but the memory savi

        # Problem statement
        model = myAwesomeModel().to(rank)
-        model = ShardedDDP(model, device_ids=[rank])
        dataloader = mySuperFastDataloader()
        loss_ln = myVeryRelevantLoss()

@@ -79,6 +78,9 @@ DDP can be used in place of ShardedDDP in the example below, but the memory savi
            optim=base_optimizer,
            **base_optimizer_arguments)

+        # Wrap the model into ShardedDDP, which will reduce gradients to the proper ranks
+        model = ShardedDDP(model, optimizer)
+
        # Any relevant training loop, nothing specific to OSS. For example:
        model.train()
        for e in range(epochs):