Such utilities include extending position embeddings, replacing current self-attention layer with sparse attention, padding sequences to multiple of block size, etc.
"""
@staticmethod
defextend_position_embedding(model,max_position):
"""This function extends the position embedding weights of a model loaded from a checkpoint.
"""Abstract Configuration class to store `sparsity configuration of a self attention layer`.
It contains shared property of different block-sparse sparsity patterns. However, each class needs to extend it based on required property and functionality.
In reality, this is not sparse and all blocks are used. We keep it for the sake of comparison and comprehension.
...
...
@@ -96,6 +97,7 @@ class FixedSparsityConfig(SparsityConfig):
For more details about this sparsity config, please see `Generative Modeling with Sparse Transformers`: https://arxiv.org/abs/1904.10509; this has been customized.
This class extends parent class of `SparsityConfig` and customizes it for `Fixed` sparsity.
"""
def__init__(self,
num_heads,
block=16,
...
...
@@ -131,14 +133,11 @@ class FixedSparsityConfig(SparsityConfig):
@@ -250,6 +246,7 @@ class VariableSparsityConfig(SparsityConfig):
For more details about `Fixed` sparsity config, please see `Generative Modeling with Sparse Transformers`: https://arxiv.org/abs/1904.10509; this has been customized.
This class extends parent class of `SparsityConfig` and customizes it for `Fixed` sparsity.
"""
def__init__(self,
num_heads,
block=16,
...
...
@@ -296,14 +293,11 @@ class VariableSparsityConfig(SparsityConfig):