1. 11 Dec, 2023 1 commit
    • Arthur's avatar
      [`Add Mixtral`] Adds support for the Mixtral MoE (#27942) · accccdd0
      Arthur authored
      
      
      * up
      
      * up
      
      * test
      
      * logits ok
      
      * up
      
      * up
      
      * few fixes
      
      * conversion script
      
      * up
      
      * nits
      
      * nits
      
      * update
      
      * nuke
      
      * more updates
      
      * nites
      
      * fix many issues
      
      * nit
      
      * scatter
      
      * nit
      
      * nuke megablocks
      
      * nits
      
      * fix conversion script
      
      * nit
      
      * remove
      
      * nits
      
      * nit
      
      * update
      
      * oupsssss
      
      * change
      
      * nits device
      
      * nits
      
      * fixup
      
      * update
      
      * merge
      
      * add copied from
      
      * fix the copy mentions
      
      * update tests
      
      * more fixes
      
      * nits
      
      * conversion script
      
      * add parts of the readme
      
      * Update tests/models/mixtral/test_modeling_mixtral.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * new test + conversion script
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Apply suggestions from code review
      
      * fix
      
      * fix copies
      
      * fix copies
      
      * ooops
      
      * fix config
      
      * Apply suggestions from code review
      
      * fix nits
      
      * nit
      
      * add copies
      
      * add batched tests
      
      * docs
      
      * fix flash attention
      
      * let's add more verbose
      
      * add correct outputs
      
      * support router ouptus
      
      * ignore copies where needed
      
      * fix
      
      * cat list if list is given for now
      
      * nits
      
      * Update docs/source/en/model_doc/mixtral.md
      
      * finish router refactoring
      
      * fix forward
      
      * fix expected values
      
      * nits
      
      * fixup
      
      * fix
      
      * fix bug
      
      * fix
      
      * fix dtype mismatch
      
      * fix
      
      * grrr grrr I support item assignment
      
      * fix CI
      
      * docs
      
      * fixup
      
      * remove some copied form
      
      * fix weird diff
      
      * skip doctest fast on the config and modeling
      
      * mark that is supports flash attention in the doc
      
      * update
      
      * Update src/transformers/models/mixtral/modeling_mixtral.py
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * Update docs/source/en/model_doc/mixtral.md
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * revert router logits config issue
      
      * update doc accordingly
      
      * Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
      
      * nits
      
      * use torch testing asssert close
      
      * fixup
      
      * doc nits
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      accccdd0
  2. 11 Dec, 2020 1 commit
  3. 20 Jun, 2020 1 commit
    • Kevin Canwen Xu's avatar
      Add BERT Loses Patience (Patience-based Early Exit) (#5078) · 2fd28d43
      Kevin Canwen Xu authored
      * Add BERT Loses Patience (Patience-based Early Exit)
      
      * update model archive
      
      * update format
      
      * sort import
      
      * flake8
      
      * Add results
      
      * full results
      
      * align the table
      
      * refactor to inherit
      
      * default per gpu eval = 1
      
      * Formatting
      
      * Formatting
      
      * isort
      
      * modify readme
      
      * Add check
      
      * Fix format
      
      * Fix format
      
      * Doc strings
      
      * ALBERT & BERT for sequence classification don't inherit from the original anymore
      
      * Remove incorrect comments
      
      * Remove incorrect comments
      
      * Remove incorrect comments
      
      * Sync up with new code
      
      * Sync up with new code
      
      * Add a test
      
      * Add a test
      
      * Add a test
      
      * Add a test
      
      * Add a test
      
      * Add a test
      
      * Finishing up!
      2fd28d43
  4. 03 Mar, 2020 1 commit
    • Sam Shleifer's avatar
      Summarization Examples: add Bart CNN Evaluation (#3082) · 5b396457
      Sam Shleifer authored
      * Rename and improve example
      
      * Add test
      
      * slightly faster test
      
      * style
      
      * This breaks remy prolly
      
      * shorter test string
      
      * no slow
      
      * newdir structure
      
      * New tree
      
      * Style
      
      * shorter
      
      * docs
      
      * clean
      
      * Attempt future import
      
      * more import hax
      5b396457
  5. 06 Jan, 2020 2 commits
  6. 22 Dec, 2019 1 commit
  7. 26 Sep, 2019 1 commit
  8. 05 Jul, 2019 1 commit
  9. 02 Jul, 2019 1 commit