1. 11 Jul, 2022 4 commits
  2. 10 Jul, 2022 1 commit
  3. 08 Jul, 2022 4 commits
  4. 07 Jul, 2022 5 commits
  5. 06 Jul, 2022 6 commits
  6. 05 Jul, 2022 4 commits
  7. 04 Jul, 2022 10 commits
  8. 01 Jul, 2022 6 commits
    • David Heryanto's avatar
      Exclude Databricks from notebook env only if the runtime is below 11.0 (#17988) · 49c8c67f
      David Heryanto authored
      * Exclude Databricks from notebook env only if the runtime is below 11.0
      
      * Dummy commit to trigger CI
      
      * Empty commit to trigger CI
      
      * Empty commit to trigger CI
      
      * Empty commit to trigger CI
      
      * Empty commit to trigger CI
      
      * Empty commit to trigger CI
      
      * Empty commit to trigger CI
      
      * Empty commit to trigger CI
      49c8c67f
    • seungeunrho's avatar
      Shifting labels for causal LM when using label smoother (#17987) · 6890d196
      seungeunrho authored
      
      
      * Shifting labels for causal LM when using label smoother
      
      When training CausalLM, loss is computed within model's foward() function and
      labels are shifted internally. However, if label smoothing is applied, loss is
      computed in trainer's compute_loss function and labels are not shifted.
      This causes unintended confusion during the alignment of labels and corresponding
      inputs. This commit is for resolving this confusion.
      
      Resolves #17960
      
      On branch shift_labels_for_causalLM
      Changes to be committed:
      	modified:   src/transformers/trainer.py
      	modified:   src/transformers/trainer_pt_utils.py
      
      * Update trainer.py
      
      * Update src/transformers/trainer.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      6890d196
    • Yih-Dar's avatar
      6f0723a9
    • amyeroberts's avatar
    • Matt's avatar
      XLA train step fixes (#17973) · d6cec458
      Matt authored
      * Copy inputs to train and test step before modifying them, as this breaks things
      
      * Add XLA tests, fix our loss functions to be XLA-compatible
      
      * make fixup
      
      * Update loss computation test to expect vector of per-sample losses
      
      * Patch loss for TFLED
      
      * Patch loss for TFAlbert
      
      * Add a tf_legacy_loss config flag that enables old loss functions
      
      * Stop using config.get() because it's not a dict
      
      * Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it
      
      * make fixup
      
      * Add XLA-compatible RAG loss
      
      * Fix dtype of loss mask for TFAlbert
      
      * Fix test for XLNet too because it overrides the default one
      
      * make fixup
      
      * Fix config test
      
      * No more depending on GPU NaN behaviour
      
      * Add test, avoid potential zero division
      
      * Fix test item assignment
      
      * Fix loss computation masking test
      
      * make fixup
      
      * Fix dtype bugs
      d6cec458
    • Sanchit Gandhi's avatar
      [Flax] Add remat (gradient checkpointing) (#17843) · 485bbe79
      Sanchit Gandhi authored
      * [Flax] Add remat (gradient checkpointing)
      
      * fix variable naming in test
      
      * flip: checkpoint using a method
      
      * fix naming
      
      * fix class naming
      
      * apply PVP's suggestions from code review
      
      * make fix-copies
      
      * fix big-bird, electra, roberta
      
      * cookie-cutter
      
      * fix flax big-bird
      
      * move test to common
      485bbe79