Commit 85b8e3d3 authored by Tri Dao's avatar Tri Dao
Browse files

[Docs] Mention that XPos's scale_base is recommended to be 512

parent 984d5204
...@@ -135,12 +135,13 @@ class RotaryEmbedding(torch.nn.Module): ...@@ -135,12 +135,13 @@ class RotaryEmbedding(torch.nn.Module):
.. _repo: https://github.com/ZhuiyiTechnology/roformer .. _repo: https://github.com/ZhuiyiTechnology/roformer
.. _GPT-NeoX: https://github.com/EleutherAI/gpt-neox .. _GPT-NeoX: https://github.com/EleutherAI/gpt-neox
If scale_base > 0, this implements XPos (Sun et al., https://arxiv.org/abs/2212.10554).
A recommended value for scale_base is 512: https://github.com/HazyResearch/flash-attention/issues/96
Reference: https://github.com/sunyt32/torchscale/blob/main/torchscale/component/xpos_relative_position.py
""" """
def __init__(self, dim: int, base=10000, scale_base=0, device=None): def __init__(self, dim: int, base=10000, scale_base=0, device=None):
""" """
If scale_base > 0, this implements XPos (Sun et al., https://arxiv.org/abs/2212.10554).
Reference: https://github.com/sunyt32/torchscale/blob/main/torchscale/component/xpos_relative_position.py
""" """
super().__init__() super().__init__()
# Generate and save the inverse frequency buffer (non trainable) # Generate and save the inverse frequency buffer (non trainable)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment