• mig-mfreitas's avatar
    Add YaRN and Dynamic-YaRN RoPE Scaling Methods (#30910) · 34b43211
    mig-mfreitas authored
    * Add YaRN and Dynamic-YaRN RoPE Scaling Methods
    
    YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
    Interpolation and Attention Scaling methods, improving upon existing
    RoPE interpolation methods for longer context window sizes.
    
    Fine-tuned models maintain their original performance across benchmarks
    while enabling efficient extrapolation and transfer learning for
    quicker convergence, especially in compute-limited environments.
    
    We implement YaRN and Dynamic-YaRN for the following list of models:
    
     - LLaMA
     - Falcon
     - GPT-NeoX
     - Olmo
     - Persimmon
     - Phi
     - StableLM
     - OpenLLaMA
    
    New unit tests are added to assert YaRN's correct behavior on both
    short and long sequence inputs.
    
    For more details, please refer to https://arxiv.org/abs/2309.00071
    
    .
    Co-authored-by: default avatarMiguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
    
    * Refactor YaRN implementation for LLaMA
    
    Iterate on YaRN implementation for LLaMA and remove diff from remaining
    models for increased PR modularity.
    
    This commit includes the following changes:
    - Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
    - Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
      from YaRN classes
    - Inherit 'forward' method in YaRN classes from superclass
    - Rename 'yarn' method to 'compute_yarn_scaling'
    - Extend YaRN tests with further assertions
    - Fix style inconsistencies
    Co-authored-by: default avatarMiguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>
    
    * Refactor Tensor Building Logic for YaRN
    
    - Comply with the the tensor building logic introduced in #30743
    - Add referencing to the optimized Attention Factor equation
    - Remove Dynamic YaRN for a more agile deployment
    Co-authored-by: default avatarmig-mfreitas <mig-mfreitas@users.noreply.github.com>
    
    * remove unwanted file
    
    ---------
    Co-authored-by: default avatarMiguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
    Co-authored-by: default avatarmig-mfreitas <mig-mfreitas@users.noreply.github.com>
    Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
    34b43211
modeling_llama.py 70.2 KB