Fix use_cache for xla fsdp (#30353)

* Fix use_cache for xla fsdp * Fix linters

Fix use_cache for xla fsdp (#30353)
* Fix use_cache for xla fsdp * Fix linters
12c39e56 · Jiewen Tan · GitHub · b8b1e442 · 12c39e56
Unverified Commit 12c39e56 authored Apr 23, 2024 by Jiewen Tan Committed by GitHub Apr 23, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 0 deletions

src/transformers/trainer.py src/transformers/trainer.py +6 -0

No files found.
--- a/src/transformers/trainer.py
+++ b/src/transformers/trainer.py
@@ -1682,6 +1682,12 @@ class Trainer:
                )
            fsdp_kwargs = self.args.xla_fsdp_config
            if self.args.fsdp_config["xla_fsdp_grad_ckpt"]:
+                if model.config.use_cache:
+                    logger.warning_once(
+                        "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`."
+                    )
+                    model.config.use_cache = False
+
                # Apply gradient checkpointing to auto-wrapped sub-modules if specified
                def auto_wrapper_callable(m, *args, **kwargs):
                    target_cls = FSDP if not self.is_fsdp_xla_v2_enabled else FSDPv2