Prevent BatchEncoding from blindly passing casts down to the tensors it...

Prevent BatchEncoding from blindly passing casts down to the tensors it contains. Fixes #6582. (#8860) Update src/transformers/tokenization_utils_base.py with review fix Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Prevent BatchEncoding from blindly passing casts down to the tensors it...
Prevent BatchEncoding from blindly passing casts down to the tensors it contains. Fixes #6582. (#8860) Update src/transformers/tokenization_utils_base.py with review fix Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
9c18f156 · Adam Pocock · GitHub · c0df963e · 9c18f156
Unverified Commit 9c18f156 authored Dec 01, 2020 by Adam Pocock Committed by GitHub Dec 01, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 10 additions and 1 deletion

src/transformers/tokenization_utils_base.py src/transformers/tokenization_utils_base.py +10 -1

No files found.
--- a/src/transformers/tokenization_utils_base.py
+++ b/src/transformers/tokenization_utils_base.py
@@ -776,7 +776,16 @@ class BatchEncoding(UserDict):
            :class:`~transformers.BatchEncoding`: The same instance of :class:`~transformers.BatchEncoding` after
            modification.
        """
-        self.data = {k: v.to(device) for k, v in self.data.items()}
+        # This check catches things like APEX blindly calling "to" on all inputs to a module
+        # Otherwise it passes the casts down and casts the LongTensor containing the token idxs
+        # into a HalfTensor
+        if isinstance(device, str) or isinstance(device, torch.device):
+            self.data = {k: v.to(device=device) for k, v in self.data.items()}
+        else:
+            logger.warning(
+                f"Attempting to cast a BatchEncoding to another type, {str(device)}. This is not supported."
+            )
        return self