[BLOOM] Clean modeling code (#18344)

* Cleanup some code * Improve signatures * Try to reduce the number of reshape/copies * I don't think we actually need the layer_num scaling trick * No need for duplication * Try to fix beam_search * Fix beam search * Removing layer num normalization seems to be breaking * Not sure self.layer_number normalization actually matters * Try and be backward compatible * Try to fix beam_search * Revert attempt to be backward compatible * Improve documentation on past_key_values format * Optimize the device allocation in case of hidden_states in multiple devices * No need to manually cast the values to a specific device * Rename with long version of variables * Improve type hinting * Add comment that explains that some methods return views * Actually i think the attention casting only makes sense when we use torch.float16 * We don't actually need layer_number to be passed anymore * Fix FX test * Bypass torch.baddbmm * Apply suggestions from code review * Add comment about support for torchScript v1.11 * fix ONNX support for bloom (#18456) Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

[BLOOM] Clean modeling code (#18344)
* Cleanup some code * Improve signatures * Try to reduce the number of reshape/copies * I don't think we actually need the layer_num scaling trick * No need for duplication * Try to fix beam_search * Fix beam search * Removing layer num normalization seems to be breaking * Not sure self.layer_number normalization actually matters * Try and be backward compatible * Try to fix beam_search * Revert attempt to be backward compatible * Improve documentation on past_key_values format * Optimize the device allocation in case of hidden_states in multiple devices * No need to manually cast the values to a specific device * Rename with long version of variables * Improve type hinting * Add comment that explains that some methods return views * Actually i think the attention casting only makes sense when we use torch.float16 * We don't actually need layer_number to be passed anymore * Fix FX test * Bypass torch.baddbmm * Apply suggestions from code review * Add comment about support for torchScript v1.11 * fix ONNX support for bloom (#18456) Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
b69a62d5 · Thomas Wang · GitHub · 02b176c4 · b69a62d5 · b69a62d5
Unverified Commit b69a62d5 authored Aug 04, 2022 by Thomas Wang Committed by GitHub Aug 04, 2022
Showing with 275 additions and 211 deletions

src/transformers/models/bloom/configuration_bloom.py src/transformers/models/bloom/configuration_bloom.py +10 -5

src/transformers/models/bloom/modeling_bloom.py src/transformers/models/bloom/modeling_bloom.py +265 -206

No files found.
--- a/src/transformers/models/bloom/configuration_bloom.py
+++ b/src/transformers/models/bloom/configuration_bloom.py
@@ -214,14 +214,19 @@ class BloomOnnxConfig(OnnxConfigWithPast):
                batch, seqlen = common_inputs["input_ids"].shape
                # Not using the same length for past_key_values
                past_key_values_length = seqlen + 2
-                past_shape = (
+                head_dim = self._config.hidden_size // self.num_attention_heads
-                    batch,
+                past_key_shape = (
+                    batch * self.num_attention_heads,
+                    head_dim,
                    past_key_values_length,
-                    self.num_attention_heads,
+                )
-                    self._config.hidden_size // self.num_attention_heads,
+                past_value_shape = (
+                    batch * self.num_attention_heads,
+                    past_key_values_length,
+                    head_dim,
                )
                ordered_inputs["past_key_values"] = [
-                    (torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(self.num_layers)
+                    (torch.zeros(past_key_shape), torch.zeros(past_value_shape)) for _ in range(self.num_layers)
                ]
        ordered_inputs["attention_mask"] = common_inputs["attention_mask"]

--- a/src/transformers/models/bloom/modeling_bloom.py
+++ b/src/transformers/models/bloom/modeling_bloom.py