1. 18 Mar, 2021 4 commits
    • Stas Bekman's avatar
      [examples/seq2seq/README.md] fix t5 examples (#10734) · 9352b515
      Stas Bekman authored
      * [examples/seq2seq] fix t5 examples
      
      This PR:
      * fixes T5 examples to include `--source_prefix` - it's **not** optional. If you give it a try you will see that you get 10x worse bleu scores w/o it. w/ `27.6849`, w/ `2.374`
      * added a normal translation example w/o the peculiarities of MBart and T5
      * reduces the default max samples to 50 so it's much faster to test quickly
      
      summarization seems to be broken for t5 score-wise: https://github.com/huggingface/transformers/issues/10733
      
      @sgugger
      
      * specify explicitly the t5 models requiring the special handling
      
      * one more
      
      * update the t5 summarization example to use cnn_dailymail
      
      * move max*samples into the top level README.md
      
      * better wording
      
      * better wording
      9352b515
    • Julien Chaumond's avatar
      [file_utils] do not gobble certain kinds of requests.ConnectionError (#10235) · 4f3e93cf
      Julien Chaumond authored
      
      
      * do not gobble certain kinds of requests.ConnectionError
      
      * Apply review comments
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      4f3e93cf
    • Suraj Patil's avatar
      add run_common_voice script (#10767) · 5f19c07a
      Suraj Patil authored
      * add initial script
      
      * finish script
      
      * add shell script example
      
      * accept chars_to_ignor as cl arg
      
      * align the script with other example scripts
      
      * add torchaudio dep
      5f19c07a
    • Mohamed El-Geish's avatar
      wav2vec2: support datasets other than LibriSpeech (#10581) · af8afdc8
      Mohamed El-Geish authored
      * wav2vec2: support datasets other than LibriSpeech
      
      * Formatting run_asr.py to pass code quality test
      
      * bundled orthography options and added verbose logs
      
      * fixing a typo in timit fine-tuning script
      
      * update comment for clarity
      
      * resize_lm_head and load custom vocab from file
      
      * adding a max_duration_in_seconds filter
      
      * do not assign `duration_filter` lambda, use a def
      
      * log untransliterated text as well
      
      * fix base model for arabic
      
      * fix duration filter when target_sr is not set
      
      * drop duration_in_seconds when unneeded
      
      * script for wav2vec2-large-lv60-timit-asr
      
      * fix for "tha" in arabic corpus (huggingface#10581)
      
      * adding more options to work with common_voice
      
      * PR feedback (huggingface#10581)
      
      * small README change
      af8afdc8
  2. 17 Mar, 2021 2 commits
  3. 16 Mar, 2021 3 commits
  4. 15 Mar, 2021 4 commits
  5. 12 Mar, 2021 1 commit
  6. 11 Mar, 2021 2 commits
  7. 10 Mar, 2021 2 commits
  8. 09 Mar, 2021 1 commit
  9. 08 Mar, 2021 4 commits
  10. 06 Mar, 2021 1 commit
  11. 05 Mar, 2021 1 commit
  12. 04 Mar, 2021 3 commits
  13. 01 Mar, 2021 1 commit
    • Patrick von Platen's avatar
      Add Fine-Tuning for Wav2Vec2 (#10145) · 0234de84
      Patrick von Platen authored
      
      
      * add encode labels function to tokenizer
      
      * start adding finetuning
      
      * init dropout
      
      * upload
      
      * correct convert script
      
      * apply changes
      
      * fix second typo
      
      * make first dummy training run
      
      * adapt convert script
      
      * push confg for comparison
      
      * remove conf
      
      * finish training
      
      * adapt data collator
      
      * add research folder
      
      * update according to fairseq feedback
      
      * some minor corrections
      
      * refactor masking indices a bit
      
      * some minor changes
      
      * clean tokenizer
      
      * finish clean-up
      
      * remove previous logic
      
      * update run script
      
      * correct training
      
      * finish changes
      
      * finish model
      
      * correct bug
      
      * fix training a bit more
      
      * add some tests
      
      * finish gradient checkpointing
      
      * finish example
      
      * correct gradient checkpointing
      
      * improve tokenization method
      
      * revert changes in tokenizer
      
      * revert general change
      
      * adapt fine-tuning
      
      * update
      
      * save intermediate test
      
      * Update README.md
      
      * finish finetuning
      
      * delete conversion script
      
      * Update src/transformers/models/wav2vec2/configuration_wav2vec2.py
      
      * Update src/transformers/models/wav2vec2/processing_wav2vec2.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * finish wav2vec2 script
      
      * finish wav2vec2 fine-tuning
      
      * finalize test
      
      * correct test
      
      * adapt tests
      
      * finish
      
      * remove test file
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      0234de84
  14. 27 Feb, 2021 3 commits
  15. 25 Feb, 2021 3 commits
  16. 24 Feb, 2021 1 commit
  17. 23 Feb, 2021 1 commit
  18. 22 Feb, 2021 3 commits