1. 13 Mar, 2024 1 commit
  2. 31 Jan, 2024 1 commit
  3. 22 Dec, 2023 1 commit
  4. 16 Nov, 2023 1 commit
  5. 27 Oct, 2023 1 commit
  6. 25 Oct, 2023 1 commit
    • Younes Belkada's avatar
      [`core`] Refactor of `gradient_checkpointing` (#27020) · 06e782da
      Younes Belkada authored
      * v1
      
      * fix
      
      * remove `create_custom_forward`
      
      * fixup
      
      * fixup
      
      * add test and fix all failing GC tests
      
      * remove all remaining `create_custom_forward` methods
      
      * fix idefics bug
      
      * fixup
      
      * replace with `__call__`
      
      * add comment
      
      * quality
      06e782da
  7. 24 Oct, 2023 1 commit
  8. 11 Oct, 2023 1 commit
    • Billy Bradley's avatar
      In assisted decoding, pass model_kwargs to model's forward call (fix... · dcc49d8a
      Billy Bradley authored
      In assisted decoding, pass model_kwargs to model's forward call (fix prepare_input_for_generation in all models) (#25242)
      
      * In assisted decoding, pass model_kwargs to model's forward call
      
      Previously, assisted decoding would ignore any additional kwargs
      that it doesn't explicitly handle. This was inconsistent with other
      generation methods, which pass the model_kwargs through
      prepare_inputs_for_generation and forward the returned dict to the
      model's forward call.
      
      The prepare_inputs_for_generation method needs to be amended in all
      models, as previously it only kept the last input ID when a past_key_values
      was passed.
      
      * Improve variable names in _extend_attention_mask
      
      * Refactor extending token_type_ids into a function
      
      * Replace deepcopy with copy to optimize performance
      
      * Update new persimmon model with llama changes for assisted generation
      
      * Update new mistral model for assisted generation with prepare_inputs_for_generation
      
      * Update position_ids creation in falcon prepare_inputs_for_generation to support assisted generation
      dcc49d8a
  9. 06 Oct, 2023 1 commit
    • Ramiro Leal-Cavazos's avatar
      Remove unnecessary `view`s of `position_ids` (#26059) · 8878eb1b
      Ramiro Leal-Cavazos authored
      * Remove unnecessary `view` of `position_ids` in `modeling_llama`
      
      When `position_ids` is `None`, its value is generated using
      `torch.arange`, which creates a tensor of size `(seq_length +
      past_key_values_length) - past_key_values_length = seq_length`. The
      tensor is then unsqueezed, resulting in a tensor of shape `(1,
      seq_length)`. This means that the last `view` to a tensor of shape
      `(-1, seq_length)` is a no-op.
      
      This commit removes the unnecessary view.
      
      * Remove no-op `view` of `position_ids` in rest of transformer models
      8878eb1b
  10. 08 Aug, 2023 1 commit
    • JB (Don)'s avatar
      Add warning for missing attention mask when pad tokens are detected (#25345) · 5ea2595e
      JB (Don) authored
      * Add attention mask and pad token warning to many of the models
      
      * Remove changes under examples/research_projects
      
      These files are not maintained by HG.
      
      * Skip the warning check during torch.fx or JIT tracing
      
      * Switch ordering for the warning and input shape assignment
      
      This ordering is a little cleaner for some of the cases.
      
      * Add missing line break in one of the files
      5ea2595e
  11. 07 Aug, 2023 1 commit
  12. 25 Jul, 2023 1 commit
  13. 27 Jun, 2023 1 commit
    • Sylvain Gugger's avatar
      Clean load keys (#24505) · 8e5d1619
      Sylvain Gugger authored
      * Preliminary work on some models
      
      * Fix test load missing and make sure nonpersistent buffers are tested
      
      * Always ignore nonpersistent buffers if in state_dict
      
      * Treat models
      
      * More models
      
      * Treat remaining models
      
      * Fix quality
      
      * Fix tests
      
      * Remove draft
      
      * This test is not needed anymore
      
      * Fix copies
      
      * Fix last test
      
      * Newly added models
      
      * Fix last tests
      
      * Address review comments
      8e5d1619
  14. 22 Jun, 2023 1 commit
  15. 21 Jun, 2023 1 commit
  16. 13 Jun, 2023 1 commit
    • Sylvain Gugger's avatar
      Tied params cleanup (#24211) · 695928e1
      Sylvain Gugger authored
      * First test
      
      * Add info for all models
      
      * style
      
      * Repo consistency
      
      * Fix last model and cleanup prints
      
      * Repo consistency
      
      * Use consistent function for detecting tied weights
      695928e1
  17. 31 May, 2023 1 commit
  18. 24 May, 2023 1 commit
  19. 04 May, 2023 1 commit
  20. 03 May, 2023 1 commit
  21. 20 Apr, 2023 1 commit
  22. 12 Apr, 2023 1 commit
  23. 27 Mar, 2023 1 commit
  24. 23 Mar, 2023 1 commit
  25. 22 Mar, 2023 1 commit
    • Nick Hill's avatar
      Fix position embeddings for GPT-J and CodeGen (#22069) · 4e94c6c0
      Nick Hill authored
      * Revert "[GPT-J] add deprecation warning (#21869)"
      
      This reverts commit fb76994c.
      
      * Fix position embeddings for GPT-J and CodeGen
      
      * Address review comments from @gante
      
      * Fix "Copied from" comment referencing wrong function
      
      * Fix copy/paste mistake
      
      * Fix training path
      
      * Hopefully make torch.fx happy
      
      * Move position_ids long cast
      
      * Revert "Hopefully make torch.fx happy"
      
      This reverts commit e41a6f4cad3ff441124c7457b19cfb630d4ca025.
      
      * Changes to help with torch.fx tracing
      
      * Linter fix
      
      * Correct position_ids tensor type hint
      
      * Work-around torch.fx tracing issue
      
      * Get the changes to work with torch.fx
      
      * Address review comment from @michaelbenayoun
      
      * Another small adjustment
      
      * Add explanatory comment; small code tidyup
      4e94c6c0
  26. 02 Mar, 2023 1 commit
  27. 28 Feb, 2023 1 commit
  28. 27 Feb, 2023 2 commits
  29. 22 Feb, 2023 1 commit
  30. 13 Feb, 2023 1 commit
  31. 07 Feb, 2023 2 commits
  32. 06 Feb, 2023 1 commit
    • Sylvain Gugger's avatar
      Update quality tooling for formatting (#21480) · 6f79d264
      Sylvain Gugger authored
      * Result of black 23.1
      
      * Update target to Python 3.7
      
      * Switch flake8 to ruff
      
      * Configure isort
      
      * Configure isort
      
      * Apply isort with line limit
      
      * Put the right black version
      
      * adapt black in check copies
      
      * Fix copies
      6f79d264
  33. 23 Jan, 2023 1 commit
  34. 20 Jan, 2023 1 commit
  35. 19 Jan, 2023 1 commit
  36. 08 Jan, 2023 1 commit
    • Arthur's avatar
      Replace `past` with `past_key_values` (#20944) · f0577df6
      Arthur authored
      * start cleanup
      
      * more updates
      
      * more models are affected
      
      * more updates
      
      * update generation utils
      
      * style
      
      * revert change that removed reorder cachce
      
      * update generation utils
      
      * style
      
      * style
      
      * remove reorder cache
      f0577df6
  37. 08 Dec, 2022 1 commit
  38. 23 Sep, 2022 1 commit