@@ -97,7 +97,7 @@ class ReformerConfig(PretrainedConfig):
...
@@ -97,7 +97,7 @@ class ReformerConfig(PretrainedConfig):
Number of following neighbouring chunks to attend to in LocalSelfAttention layer in addition to itself.
Number of following neighbouring chunks to attend to in LocalSelfAttention layer in addition to itself.
local_attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0.1):
local_attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0.1):
The dropout ratio for the attention probabilities in LocalSelfAttention.
The dropout ratio for the attention probabilities in LocalSelfAttention.
lsh_chunk_length (:obj:`int`, optional, defaults to 64):
lsh_attn_chunk_length (:obj:`int`, optional, defaults to 64):
Length of chunk which attends to itself in LSHSelfAttention. Chunking reduces memory complexity from sequence length x sequence length (self attention) to chunk length x chunk length x sequence length / chunk length (chunked self attention).
Length of chunk which attends to itself in LSHSelfAttention. Chunking reduces memory complexity from sequence length x sequence length (self attention) to chunk length x chunk length x sequence length / chunk length (chunked self attention).
lsh_num_chunks_before (:obj:`int`, optional, defaults to 1):
lsh_num_chunks_before (:obj:`int`, optional, defaults to 1):
Number of previous neighbouring chunks to attend to in LSHSelfAttention layer to itself.
Number of previous neighbouring chunks to attend to in LSHSelfAttention layer to itself.