[fix] minor fixes for master branch (#792)

* add changelog for previous commit * add changelog for previous commit * add changelog for previous commit * fix a merge induced error Co-authored-by: Min Xu <min.xu.public@gmail.com>

[fix] minor fixes for master branch (#792)
* add changelog for previous commit * add changelog for previous commit * add changelog for previous commit * fix a merge induced error Co-authored-by: Min Xu <min.xu.public@gmail.com>
31e36453 · Min Xu · GitHub · 4fa2ab9b · 31e36453 · 31e36453
Unverified Commit 31e36453 authored Sep 12, 2021 by Min Xu Committed by GitHub Sep 12, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 11 additions and 1 deletion

CHANGELOG.md CHANGELOG.md +10 -1

fairscale/nn/checkpoint/checkpoint_activations.py fairscale/nn/checkpoint/checkpoint_activations.py +1 -0

No files found.
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -13,6 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
        that don't require grad and hence can fail the previous assert. [#761]
 - FSDP: Fixed a bug when multiple backward pass is called within an iteration, parameters' sharding
        state might be incorrect. [#775]
+- activation checkpoint: Ensure outputs of checkpointed modules only require grad if either
+                         the input requires grad or if the parameters require grad. [#787]
 ### Added
 - FSDP: Added support for returning the original names of parameters when `named_parameters` is called on
@@ -20,7 +22,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
        call `named_parameters` under the `summon_full_params` context when using flattened params or original
        params. If you are using original params (i.e flatten_params=False), calling `named_parameters` outside
        of the `summon_full_params` context will still return the original param names along with the local shards. [#755]
+- FSDP: Ensure gradient reduction accumulates into the unsharded gradient tensor
+        within a backwards pass. This matters when an FSDP module is called
+        multiple times within a forward pass, and reduction is not deferred
+        using activation checkpoint forward counters, bucketing or some other
+        mechanism. [#784]
+- activation checkpoint: Added a context manager to disable checkpoint in case the same wrapped module
+                         needs to be checkpointed and not checkpointed in different parts of
+                         the module forward pass. [#772]
 ## [0.4.0] - 2021-07-31
 ### Fixed

--- a/fairscale/nn/checkpoint/checkpoint_activations.py
+++ b/fairscale/nn/checkpoint/checkpoint_activations.py
@@ -277,6 +277,7 @@ class CheckpointFunction(torch.autograd.Function):
        with torch.no_grad(), enable_checkpointing():
            unpacked_args, unpacked_kwargs = unpack_kwargs(kwarg_keys, args)
            outputs = run_function(*unpacked_args, **unpacked_kwargs)
+            the_module = unpacked_args[0]
        # Because we run with torch.no_grad(), we can't actually access
        # outputs.requires_grad. Instead, we manually compute it by