Commits · 5c673efad71026ec820c5101349aa0ae8a95b360 · chenpangpang / transformers

30 Jul, 2021 1 commit

fix typo in gradient_checkpointing arg (#12855) · 5c673efa

21jun authored Jul 30, 2021

help for `ModelArguments.gradient_checkpointing` should be
"If True, use gradient checkpointing to save memory
at the expense of slower backward pass."
not "Whether to freeze the feature extractor layers of the model."
(which is duplicated from `freeze_feature_extractor` arg)

5c673efa

25 Jun, 2021 1 commit
- remove extra white space from log format (#12360) · 4a872cae
  Stas Bekman authored Jun 25, 2021
  
  4a872cae
14 Jun, 2021 1 commit
- [style] consistent nn. and nn.functional: part 4 `examples` (#12156) · 88e84186
  Stas Bekman authored Jun 14, 2021
```
* consistent nn. and nn.functional: p4 examples

* restore
```
  88e84186
12 May, 2021 1 commit
- remove defaults to None if optional (#11703) · 77f4c46b
  Philip May authored May 12, 2021
  
  77f4c46b
18 Mar, 2021 1 commit

wav2vec2: support datasets other than LibriSpeech (#10581) · af8afdc8

Mohamed El-Geish authored Mar 18, 2021

* wav2vec2: support datasets other than LibriSpeech

* Formatting run_asr.py to pass code quality test

* bundled orthography options and added verbose logs

* fixing a typo in timit fine-tuning script

* update comment for clarity

* resize_lm_head and load custom vocab from file

* adding a max_duration_in_seconds filter

* do not assign `duration_filter` lambda, use a def

* log untransliterated text as well

* fix base model for arabic

* fix duration filter when target_sr is not set

* drop duration_in_seconds when unneeded

* script for wav2vec2-large-lv60-timit-asr

* fix for "tha" in arabic corpus (huggingface#10581)

* adding more options to work with common_voice

* PR feedback (huggingface#10581)

* small README change

af8afdc8

05 Mar, 2021 1 commit
- fix run seq2seq (#10547) · 395ffcd7
  Patrick von Platen authored Mar 05, 2021
  
  395ffcd7
01 Mar, 2021 1 commit

Add Fine-Tuning for Wav2Vec2 (#10145) · 0234de84

Patrick von Platen authored Mar 01, 2021



* add encode labels function to tokenizer

* start adding finetuning

* init dropout

* upload

* correct convert script

* apply changes

* fix second typo

* make first dummy training run

* adapt convert script

* push confg for comparison

* remove conf

* finish training

* adapt data collator

* add research folder

* update according to fairseq feedback

* some minor corrections

* refactor masking indices a bit

* some minor changes

* clean tokenizer

* finish clean-up

* remove previous logic

* update run script

* correct training

* finish changes

* finish model

* correct bug

* fix training a bit more

* add some tests

* finish gradient checkpointing

* finish example

* correct gradient checkpointing

* improve tokenization method

* revert changes in tokenizer

* revert general change

* adapt fine-tuning

* update

* save intermediate test

* Update README.md

* finish finetuning

* delete conversion script

* Update src/transformers/models/wav2vec2/configuration_wav2vec2.py

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* finish wav2vec2 script

* finish wav2vec2 fine-tuning

* finalize test

* correct test

* adapt tests

* finish

* remove test file
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

0234de84