1. 15 Mar, 2022 1 commit
    • Anton Lozhkov's avatar
      Add the XTREME-S fine-tuning example (#15985) · 99fd3eb4
      Anton Lozhkov authored
      * CTC+classification draft
      
      * CTC+classification draft
      
      * style
      
      * multilingual runs
      
      * Fix race condition during processor.from_reatrained
      
      * Merge covost experiments
      
      * Add README
      
      * Quality
      
      * Switch to .all configs
      
      * Fix typos
      99fd3eb4
  2. 12 Mar, 2022 1 commit
    • Stas Bekman's avatar
      [Deepspeed] add support for bf16 mode (#14569) · 580dd87c
      Stas Bekman authored
      
      
      * [WIP] add support for bf16 mode
      
      * prep for bf16
      
      * prep for bf16
      
      * fix; zero2/bf16 is ok
      
      * check bf16 is available
      
      * test fixes
      
      * enable zero3_bf16
      
      * config files
      
      * docs
      
      * split stage_dtype; merge back to non-dtype-specific config file
      
      * fix doc
      
      * cleanup
      
      * cleanup
      
      * bfloat16 => bf16 to match the PR changes
      
      * s/zero_gather_fp16_weights_on_model_save/zero_gather_16bit_weights_on_model_save/; s/save_fp16_model/save_16bit_model/
      
      * test fixes/skipping
      
      * move
      
      * fix
      
      * Update docs/source/main_classes/deepspeed.mdx
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * backticks
      
      * cleanup
      
      * cleanup
      
      * cleanup
      
      * new version
      
      * add note about grad accum in bf16
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      580dd87c
  3. 10 Mar, 2022 1 commit
  4. 04 Mar, 2022 1 commit
  5. 02 Mar, 2022 1 commit
  6. 21 Feb, 2022 1 commit
  7. 15 Feb, 2022 1 commit
  8. 11 Feb, 2022 1 commit
  9. 09 Feb, 2022 1 commit
  10. 07 Feb, 2022 1 commit
  11. 31 Jan, 2022 2 commits
    • Jonatas Grosman's avatar
    • Julien Plu's avatar
      Add (M)Luke model training for Token Classification in the examples (#14880) · aa19f478
      Julien Plu authored
      * Add Luke training
      
      * Fix true label tags
      
      * Fix true label tags
      
      * Fix true label tags
      
      * Update the data collator for Luke
      
      * Some training refactor for Luke
      
      * Improve data collator for Luke
      
      * Fix import
      
      * Fix datasets concatenation
      
      * Add the --max_entity_length argument for Luke models
      
      * Remove unused code
      
      * Fix style issues
      
      * Fix style issues
      
      * Move the Luke training into a separate folder
      
      * Fix style
      
      * Fix naming
      
      * Fix filtering
      
      * Fix filtering
      
      * Fix filter
      
      * Update some preprocessing
      
      * Move luke to research_projects
      
      * Checkstyle
      
      * Address comments
      
      * Fix style
      aa19f478
  12. 27 Jan, 2022 4 commits
  13. 24 Jan, 2022 1 commit
  14. 21 Jan, 2022 2 commits
  15. 20 Jan, 2022 2 commits
  16. 19 Jan, 2022 5 commits
  17. 18 Jan, 2022 1 commit
  18. 12 Jan, 2022 1 commit
  19. 10 Jan, 2022 1 commit
  20. 23 Dec, 2021 1 commit
  21. 13 Dec, 2021 1 commit
  22. 06 Dec, 2021 1 commit
  23. 02 Dec, 2021 1 commit
    • Leandro von Werra's avatar
      Add CodeParrot 馃 codebase (#14536) · 43f953cc
      Leandro von Werra authored
      
      
      * add readme skeleton
      
      * update readme
      
      * add initialization script
      
      * add deduplication script
      
      * add codeparrot training script
      
      * add code generation evaluation
      
      * add validation loss script
      
      * add requirements
      
      * update readme
      
      * tweak readme
      
      * make style
      
      * add highlights to readme
      
      * add CLIs to scripts
      
      * add tokenizer training script
      
      * add docstring to constant length dataset
      
      * fix defaults in arguments
      
      * update readme with cli
      
      * move image to hub
      
      * tweaks of readme
      
      * fix cli commands
      
      * add author
      
      * explain env variables
      
      * fix formatting
      
      * Update examples/research_projects/codeparrot/README.md
      Co-authored-by: default avatarlewtun <lewis.c.tunstall@gmail.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarlewtun <lewis.c.tunstall@gmail.com>
      
      * replace generic with gpt2 tokenizer
      Co-authored-by: default avatarlewtun <lewis.c.tunstall@gmail.com>
      43f953cc
  24. 30 Nov, 2021 1 commit
  25. 22 Nov, 2021 1 commit
  26. 19 Nov, 2021 1 commit
  27. 17 Nov, 2021 1 commit
  28. 15 Nov, 2021 1 commit
  29. 11 Nov, 2021 2 commits
    • Stas Bekman's avatar
      fix --gradient_checkpointing (#13964) · 77262ef7
      Stas Bekman authored
      77262ef7
    • Suraj Patil's avatar
      Fix Flax params dtype (#13098) · e92190c0
      Suraj Patil authored
      
      
      * fix inits
      
      * fix embed dtype
      
      * fix embed dtype
      
      * add test to check default dtype
      
      * quality
      
      * add type conversion methods for flax models
      
      * more robust casting
      
      * cast sinusoidal positions
      
      * update pegasus
      
      * update albert
      
      * update test
      
      * make sure dtype is passed to every module
      
      * style
      
      * fix electra dense
      
      * fix t5
      
      * quality
      
      * add more tests
      
      * better name
      
      * use the dtype for lm head computation
      
      * fix albert
      
      * style
      
      * fix albert embed dtype
      
      * more tests
      
      * fix vision enc-dec
      
      * cleanup
      
      * fix embed dtype pegasus
      
      * fix default param test
      
      * doc
      
      * update template
      
      * fix final_logits_bias dtype
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * fix doc
      
      * fix doc
      
      * add detailed docstring for dtype parameter
      
      * remove un-necessary import
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      e92190c0