Add Support to Gradient Checkpointing for LongT5 (#18977)

FlaxLongT5PreTrainedModel is missing "enable_gradient_checkpointing" function. This gives an error if someone tries to enable gradient checkpointing for longt5. This pull request fixes it.

Add Support to Gradient Checkpointing for LongT5 (#18977)
FlaxLongT5PreTrainedModel is missing "enable_gradient_checkpointing" function. This gives an error if someone tries to enable gradient checkpointing for longt5. This pull request fixes it.
5a70a77b · Ahmed Elnaggar · GitHub · 4157e3cd · 5a70a77b
Unverified Commit 5a70a77b authored Sep 14, 2022 by Ahmed Elnaggar Committed by GitHub Sep 14, 2022
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 0 deletions

src/transformers/models/longt5/modeling_flax_longt5.py src/transformers/models/longt5/modeling_flax_longt5.py +7 -0

No files found.
--- a/src/transformers/models/longt5/modeling_flax_longt5.py
+++ b/src/transformers/models/longt5/modeling_flax_longt5.py
@@ -1686,6 +1686,13 @@ class FlaxLongT5PreTrainedModel(FlaxPreTrainedModel):
        module = self.module_class(config=config, dtype=dtype, **kwargs)
        super().__init__(config, module, input_shape=input_shape, seed=seed, dtype=dtype, _do_init=_do_init)

+    def enable_gradient_checkpointing(self):
+        self._module = self.module_class(
+            config=self.config,
+            dtype=self.dtype,
+            gradient_checkpointing=True,
+        )
+
    def init_weights(self, rng: jax.random.PRNGKey, input_shape: Tuple, params: FrozenDict = None) -> FrozenDict:
        # init input tensors
        input_ids = jnp.zeros(input_shape, dtype="i4")