1. 03 Feb, 2023 1 commit
    • Matthijs Hollemans's avatar
      [WIP] add SpeechT5 model (#18922) · e4bacf66
      Matthijs Hollemans authored
      * make SpeechT5 model by copying Wav2Vec2
      
      * add paper to docs
      
      * whoops added docs in wrong file
      
      * remove SpeechT5Tokenizer + put CTC back in the name
      
      * remove deprecated class
      
      * remove unused docstring
      
      * delete SpeechT5FeatureExtractor, use Wav2Vec2FeatureExtractor instead
      
      * remove classes we don't need right now
      
      * initial stab at speech encoder prenet
      
      * add more speech encoder prenet stuff
      
      * improve SpeechEncoderPrenet
      
      * add encoder (not finished yet)
      
      * add relative position bias to self-attention
      
      * add encoder CTC layers
      
      * fix formatting
      
      * add decoder from BART, doesn't work yet
      
      * make it work with generate loop
      
      * wrap the encoder into a speech encoder class
      
      * wrap the decoder in a text decoder class
      
      * changed my mind
      
      * changed my mind again ;-)
      
      * load decoder weights, make it work
      
      * add weights for text decoder postnet
      
      * add SpeechT5ForCTC model that uses only the encoder
      
      * clean up EncoderLayer and DecoderLayer
      
      * implement _init_weights in SpeechT5PreTrainedModel
      
      * cleanup config + Encoder and Decoder
      
      * add head + cross attention masks
      
      * improve doc comments
      
      * fixup
      
      * more cleanup
      
      * more fixup
      
      * TextDecoderPrenet works now, thanks Kendall
      
      * add CTC loss
      
      * add placeholders for other pre/postnets
      
      * add type annotation
      
      * fix freeze_feature_encoder
      
      * set padding tokens to 0 in decoder attention mask
      
      * encoder attention mask downsampling
      
      * remove features_pen calculation
      
      * disable the padding tokens thing again
      
      * fixup
      
      * more fixup
      
      * code review fixes
      
      * rename encoder/decoder wrapper classes
      
      * allow checkpoints to be loaded into SpeechT5Model
      
      * put encoder into wrapper for CTC model
      
      * clean up conversion script
      
      * add encoder for TTS model
      
      * add speech decoder prenet
      
      * add speech decoder post-net
      
      * attempt to reconstruct the generation loop
      
      * add speech generation loop
      
      * clean up generate_speech
      
      * small tweaks
      
      * fix forward pass
      
      * enable always dropout on speech decoder prenet
      
      * sort declaration
      
      * rename models
      
      * fixup
      
      * fix copies
      
      * more fixup
      
      * make consistency checker happy
      
      * add Seq2SeqSpectrogramOutput class
      
      * doc comments
      
      * quick note about loss and labels
      
      * add HiFi-GAN implementation (from Speech2Speech PR)
      
      * rename file
      
      * add vocoder to TTS model
      
      * improve vocoder
      
      * working on tokenizer
      
      * more better tokenizer
      
      * add CTC tokenizer
      
      * fix decode and batch_code in CTC tokenizer
      
      * fix processor
      
      * two processors and feature extractors
      
      * use SpeechT5WaveformFeatureExtractor instead of Wav2Vec2
      
      * cleanup
      
      * more cleanup
      
      * even more fixup
      
      * notebooks
      
      * fix log-mel spectrograms
      
      * support reduction factor
      
      * fixup
      
      * shift spectrograms to right to create decoder inputs
      
      * return correct labels
      
      * add labels for stop token prediction
      
      * fix doc comments
      
      * fixup
      
      * remove SpeechT5ForPreTraining
      
      * more fixup
      
      * update copyright headers
      
      * add usage examples
      
      * add SpeechT5ProcessorForCTC
      
      * fixup
      
      * push unofficial checkpoints to hub
      
      * initial version of tokenizer unit tests
      
      * add slow test
      
      * fix failing tests
      
      * tests for CTC tokenizer
      
      * finish CTC tokenizer tests
      
      * processor tests
      
      * initial test for feature extractors
      
      * tests for spectrogram feature extractor
      
      * fixup
      
      * more fixup
      
      * add decorators
      
      * require speech for tests
      
      * modeling tests
      
      * more tests for ASR model
      
      * fix imports
      
      * add fake tests for the other models
      
      * fixup
      
      * remove jupyter notebooks
      
      * add missing SpeechT5Model tests
      
      * add missing tests for SpeechT5ForCTC
      
      * add missing tests for SpeechT5ForTextToSpeech
      
      * sort tests by name
      
      * fix Hi-Fi GAN tests
      
      * fixup
      
      * add speech-to-speech model
      
      * refactor duplicate speech generation code
      
      * add processor for SpeechToSpeech model
      
      * add usage example
      
      * add tests for speech-to-speech model
      
      * fixup
      
      * enable gradient checkpointing for SpeechT5FeatureEncoder
      
      * code review
      
      * push_to_hub now takes repo_id
      
      * improve doc comments for HiFi-GAN config
      
      * add missing test
      
      * add integration tests
      
      * make number of layers in speech decoder prenet configurable
      
      * rename variable
      
      * rename variables
      
      * add auto classes for TTS and S2S
      
      * REMOVE CTC!!!
      
      * S2S processor does not support save/load_pretrained
      
      * fixup
      
      * these models are now in an auto mapping
      
      * fix doc links
      
      * rename HiFiGAN to HifiGan, remove separate config file
      
      * REMOVE auto classes
      
      * there can be only one
      
      * fixup
      
      * replace assert
      
      * reformat
      
      * feature extractor can process input and target at same time
      
      * update checkpoint names
      
      * fix commit hash
      e4bacf66
  2. 11 Apr, 2022 1 commit
    • SaulLu's avatar
      add a warning in `SpmConverter` for sentencepiece's model using the byte fallback feature (#16629) · 1025a9b7
      SaulLu authored
      * update proto sentencepiece model
      
      * Revert "update proto sentencepiece model"
      
      This reverts commit b07f671747fec35773d0b3d4788b8b15aefa0229.
      
      * add check
      
      * add test
      
      * Revert "Revert "update proto sentencepiece model""
      
      This reverts commit 46108257b8927b73627ec8f4f3eed53a95fc700d.
      
      * test for log level
      
      * test for log level 2
      
      * warning at the warning level
      
      * clean
      
      * format
      
      * add explanation in docstring
      1025a9b7
  3. 24 Jan, 2022 1 commit
    • Sylvain Gugger's avatar
      Add model like (#14992) · 81156d20
      Sylvain Gugger authored
      
      
      * Add new model like command
      
      * Bad doc-styler
      
      * black and doc-styler, stop fighting!
      
      * black and doc-styler, stop fighting!
      
      * At last
      
      * Clean up
      
      * Typo
      
      * Bad doc-styler
      
      * Bad doc-styler
      
      * All good maybe?
      
      * Use constants
      
      * Add doc and type hints
      
      * More cleaning
      
      * Add doc
      
      * Fix Copied from
      
      * Doc template
      
      * Use typing.Pattern instead
      
      * Framework-specific files
      
      * Fixes
      
      * Select frameworks clean model init
      
      * Deal with frameworks in main init
      
      * fixes
      
      * Last fix
      
      * Prompt user for info
      
      * Delete exemple config
      
      * Last fixes
      
      * Add test config
      
      * Fix bug with model_type included in each other
      
      * Fixes
      
      * More fixes
      
      * More fixes
      
      * Adapt config
      
      * Remove print statements
      
      * Will fix tokenization later, leave it broken for now
      
      * Add test
      
      * Quality
      
      * Try this way
      
      * Debug
      
      * Maybe by setting the path?
      
      * Let's try another way
      
      * It should go better when actually passing the arg...
      
      * Remove debug statements and style
      
      * Fix config
      
      * Add tests
      
      * Test require the three backends
      
      * intermediate commit
      
      * Revamp pattern replacements and start work on feature extractors
      
      * Adapt model info
      
      * Finalize code for processors
      
      * Fix in main init additions
      
      * Finish questionnaire for processing classes
      
      * Fix file name
      
      * Fix for real
      
      * Fix patterns
      
      * Style
      
      * Remove needless warnings
      
      * Copied from should work now.
      
      * Include Copied form in blocks
      
      * Add test
      
      * More fixes and tests
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Address review comment
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      81156d20
  4. 22 Dec, 2021 1 commit
  5. 08 Dec, 2021 1 commit
  6. 22 Nov, 2021 1 commit
  7. 26 Oct, 2021 1 commit
  8. 21 Sep, 2021 1 commit
  9. 07 Sep, 2021 1 commit
  10. 02 Sep, 2021 1 commit
    • Nathan Raw's avatar
      Add PyTorch image classification example (#13134) · 76c4d8bf
      Nathan Raw authored
      *  add pytorch image classification example
      
      * 🔥 remove utils.py
      
      * 💄 fix flake8 style issues
      
      * 🔥 remove unnecessary line
      
      *  limit dataset sizes
      
      * 📌 update reqs
      
      * 🎨 restructure - use datasets lib
      
      * 🎨 import transforms directly
      
      * 📝 add comments
      
      * 💄 style
      
      * 🔥 remove flag
      
      * 📌 update requirement warning
      
      * 📝 add vision README.md
      
      * 📝 update README.md
      
      * 📝 update README.md
      
      * 🎨 add image-classification tag to model card
      
      * 🚚 rename vision ️ image-classification
      
      * 📝 update image-classification README.md
      76c4d8bf
  11. 09 Jun, 2021 1 commit
    • NielsRogge's avatar
      Add DETR (#11653) · d3eacbb8
      NielsRogge authored
      
      
      * Squash all commits of modeling_detr_v7 branch into one
      
      * Improve docs
      
      * Fix tests
      
      * Style
      
      * Improve docs some more and fix most tests
      
      * Fix slow tests of ViT, DeiT and DETR
      
      * Improve replacement of batch norm
      
      * Restructure timm backbone forward
      
      * Make DetrForSegmentation support any timm backbone
      
      * Fix name of output
      
      * Address most comments by @LysandreJik
      
      * Give better names for variables
      
      * Conditional imports + timm in setup.py
      
      * Address additional comments by @sgugger
      
      * Make style, add require_timm and require_vision to testsé
      
      * Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone
      
      * Add png files to fixtures
      
      * Fix type hint
      
      * Add timm to workflows
      
      * Add `BatchNorm2d` to the weight initialization
      
      * Fix retain_grad test
      
      * Replace model checkpoints by Facebook namespace
      
      * Fix name of checkpoint in test
      
      * Add user-friendly message when scipy is not available
      
      * Address most comments by @patrickvonplaten
      
      * Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner
      
      * Better initialization
      
      * Scipy is necessary to get sklearn metrics
      
      * Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel
      
      * Make style
      
      * Improve docs and add 2 community notebooks
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      d3eacbb8
  12. 12 May, 2021 1 commit
  13. 07 May, 2021 1 commit
  14. 21 Apr, 2021 1 commit
  15. 06 Apr, 2021 1 commit
    • Sylvain Gugger's avatar
      Auto feature extractor (#11097) · 403d530e
      Sylvain Gugger authored
      * AutoFeatureExtractor
      
      * Init and first tests
      
      * Tests
      
      * Damn you gitignore
      
      * Quality
      
      * Defensive test for when not all backends are here
      
      * Use pattern for Speech2Text models
      403d530e
  16. 13 Feb, 2021 1 commit
    • Nicolas Patry's avatar
      Conversion from slow to fast for BPE spm vocabs contained an error. (#10120) · c9837a0d
      Nicolas Patry authored
      * Conversion from slow to fast for BPE spm vocabs contained an error.
      
      - There is only 1 test currently (tokenizers + slow) that used the modified path
      and it's reformer, which does not contain any ids modification so the
      bug was silent for now.
      - The real issue is that vocab variable was overloaded by
      SentencePieceExtractor, leading to Slow specific vocab oddities to be
      completely ignored
      - The bug was reported here https://github.com/huggingface/transformers/issues/9518
      - Ran the complete tokenization test suite with slow without error
      (`RUN_SLOW=1 pytest -sv tests/test_tokenization_*`)
      
      * Remove rebase error.
      
      * Adding the fixture.
      c9837a0d
  17. 19 Jan, 2021 1 commit
  18. 18 Dec, 2020 1 commit
  19. 08 Dec, 2020 1 commit
  20. 09 Nov, 2020 1 commit
  21. 22 Oct, 2020 1 commit
  22. 18 Oct, 2020 1 commit
    • Thomas Wolf's avatar
      [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659) · ba8c4d0a
      Thomas Wolf authored
      * splitting fast and slow tokenizers [WIP]
      
      * [WIP] splitting sentencepiece and tokenizers dependencies
      
      * update dummy objects
      
      * add name_or_path to models and tokenizers
      
      * prefix added to file names
      
      * prefix
      
      * styling + quality
      
      * spliting all the tokenizer files - sorting sentencepiece based ones
      
      * update tokenizer version up to 0.9.0
      
      * remove hard dependency on sentencepiece 🎉
      
      * and removed hard dependency on tokenizers 🎉
      
      
      
      * update conversion script
      
      * update missing models
      
      * fixing tests
      
      * move test_tokenization_fast to main tokenization tests - fix bugs
      
      * bump up tokenizers
      
      * fix bert_generation
      
      * update ad fix several tokenizers
      
      * keep sentencepiece in deps for now
      
      * fix funnel and deberta tests
      
      * fix fsmt
      
      * fix marian tests
      
      * fix layoutlm
      
      * fix squeezebert and gpt2
      
      * fix T5 tokenization
      
      * fix xlnet tests
      
      * style
      
      * fix mbart
      
      * bump up tokenizers to 0.9.2
      
      * fix model tests
      
      * fix tf models
      
      * fix seq2seq examples
      
      * fix tests without sentencepiece
      
      * fix slow => fast  conversion without sentencepiece
      
      * update auto and bert generation tests
      
      * fix mbart tests
      
      * fix auto and common test without tokenizers
      
      * fix tests without tokenizers
      
      * clean up tests lighten up when tokenizers + sentencepiece are both off
      
      * style quality and tests fixing
      
      * add sentencepiece to doc/examples reqs
      
      * leave sentencepiece on for now
      
      * style quality split hebert and fix pegasus
      
      * WIP Herbert fast
      
      * add sample_text_no_unicode and fix hebert tokenization
      
      * skip FSMT example test for now
      
      * fix style
      
      * fix fsmt in example tests
      
      * update following Lysandre and Sylvain's comments
      
      * Update src/transformers/testing_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/testing_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      ba8c4d0a
  23. 09 Oct, 2020 1 commit
  24. 10 Sep, 2020 1 commit
    • Yu Liu's avatar
      Albert pretrain datasets/ datacollator (#6168) · 762cba3b
      Yu Liu authored
      
      
      * add dataset for albert pretrain
      
      * datacollator for albert pretrain
      
      * naming, comprehension, file reading change
      
      * data cleaning is no needed after this modification
      
      * delete prints
      
      * fix a bug
      
      * file structure change
      
      * add tests for albert datacollator
      
      * remove random seed
      
      * add back len and get item function
      
      * sample file for testing and test code added
      
      * format change for black
      
      * more format change
      
      * Style
      
      * var assignment issue resolve
      
      * add back wrongly deleted DataCollatorWithPadding in init file
      
      * Style
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      762cba3b
  25. 07 May, 2020 1 commit
    • Julien Chaumond's avatar
      BIG Reorganize examples (#4213) · 0ae96ff8
      Julien Chaumond authored
      * Created using Colaboratory
      
      * [examples] reorganize files
      
      * remove run_tpu_glue.py as superseded by TPU support in Trainer
      
      * Bugfix: int, not tuple
      
      * move files around
      0ae96ff8
  26. 11 Jan, 2020 1 commit
  27. 06 Jan, 2020 2 commits
  28. 22 Dec, 2019 1 commit