1. 18 Sep, 2023 1 commit
    • Arthur's avatar
      馃毃馃毃 馃毃馃毃 [`Tokenizer`] attemp to fix add_token issues馃毃馃毃 馃毃馃毃 (#23909) · 2da88537
      Arthur authored
      * fix test for bart. Order is correct now let's skip BPEs
      
      * ouf
      
      * styling
      
      * fix bert....
      
      * slow refactoring
      
      * current updates
      
      * massive refactoring
      
      * update
      
      * NICE!
      
      * update to see where I am at
      
      * updates
      
      * update
      
      * update
      
      * revert
      
      * updates
      
      * updates
      
      * start supporting legacy_save
      
      * styling
      
      * big update
      
      * revert some changes
      
      * nits
      
      * nniiiiiice
      
      * small fixes
      
      * kinda fix t5 with new behaviour
      
      * major update
      
      * fixup
      
      * fix copies
      
      * today's updates
      
      * fix byt5
      
      * upfate
      
      * update
      
      * update
      
      * updates
      
      * update vocab size test
      
      * Barthez does not use not need the fairseq offset ids
      
      * super calll must be after
      
      * calll super
      
      * move all super init
      
      * move other super init
      
      * fixup
      
      * nits
      
      * more fixes
      
      * nits
      
      * more fixes
      
      * nits
      
      * more fix
      
      * remove useless files
      
      * ouch all of them are affected
      ...
      2da88537
  2. 07 Feb, 2023 1 commit
    • Sylvain Gugger's avatar
      Cleanup quality (#21493) · 67d07487
      Sylvain Gugger authored
      * Remove mentions of flake8/isort
      
      * Clean up inits
      
      * Deall with all other inits
      
      * Last special rule for dummy files
      67d07487
  3. 17 Feb, 2022 1 commit
    • NielsRogge's avatar
      Add SimMIM (#15586) · 57882177
      NielsRogge authored
      
      
      * Add first draft
      
      * Make model importable
      
      * Make SwinForMaskedImageModeling importable
      
      * Fix imports
      
      * Add missing inits
      
      * Add support for Swin
      
      * Fix bug
      
      * Fix bug
      
      * Fix another bug
      
      * Fix Swin MIM implementation
      
      * Fix default encoder stride
      
      * Fix Swin
      
      * Add print statements for debugging
      
      * Add image_size data argument
      
      * Fix Swin
      
      * Fix image_size
      
      * Add print statements for debugging
      
      * Fix print statement
      
      * Remove print statements
      
      * Improve reshaping of bool_masked_pos
      
      * Add support for DeiT, fix tests
      
      * Improve docstrings
      
      * Apply new black version
      
      * Improve script
      
      * Fix bug
      
      * Improve README
      
      * Apply suggestions from code review
      
      * Remove DS_Store and add to gitignore
      
      * Apply suggestions from code review + fix BEiT Flax
      
      * Revert BEiT changes
      
      * Improve README
      
      * Fix code quality
      
      * Improve README
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MBP.localdomain>
      Co-authored-by: Niels Rogge <nielsr...
      57882177
  4. 06 Apr, 2021 1 commit
    • Sylvain Gugger's avatar
      Auto feature extractor (#11097) · 403d530e
      Sylvain Gugger authored
      * AutoFeatureExtractor
      
      * Init and first tests
      
      * Tests
      
      * Damn you gitignore
      
      * Quality
      
      * Defensive test for when not all backends are here
      
      * Use pattern for Speech2Text models
      403d530e
  5. 08 Dec, 2020 1 commit
  6. 17 Nov, 2020 1 commit
    • Sylvain Gugger's avatar
      Reorganize repo (#8580) · c89bdfbe
      Sylvain Gugger authored
      * Put models in subfolders
      
      * Styling
      
      * Fix imports in tests
      
      * More fixes in test imports
      
      * Sneaky hidden imports
      
      * Fix imports in doc files
      
      * More sneaky imports
      
      * Finish fixing tests
      
      * Fix examples
      
      * Fix path for copies
      
      * More fixes for examples
      
      * Fix dummy files
      
      * More fixes for example
      
      * More model import fixes
      
      * Is this why you're unhappy GitHub?
      
      * Fix imports in conver command
      c89bdfbe
  7. 19 Oct, 2020 1 commit
  8. 18 Oct, 2020 1 commit
    • Thomas Wolf's avatar
      [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659) · ba8c4d0a
      Thomas Wolf authored
      * splitting fast and slow tokenizers [WIP]
      
      * [WIP] splitting sentencepiece and tokenizers dependencies
      
      * update dummy objects
      
      * add name_or_path to models and tokenizers
      
      * prefix added to file names
      
      * prefix
      
      * styling + quality
      
      * spliting all the tokenizer files - sorting sentencepiece based ones
      
      * update tokenizer version up to 0.9.0
      
      * remove hard dependency on sentencepiece 馃帀
      
      * and removed hard dependency on tokenizers 馃帀
      
      * update conversion script
      
      * update missing models
      
      * fixing tests
      
      * move test_tokenization_fast to main tokenization tests - fix bugs
      
      * bump up tokenizers
      
      * fix bert_generation
      
      * update ad fix several tokenizers
      
      * keep sentencepiece in deps for now
      
      * fix funnel and deberta tests
      
      * fix fsmt
      
      * fix marian tests
      
      * fix layoutlm
      
      * fix squeezebert and gpt2
      
      * fix T5 tokenization
      
      * fix xlnet tests
      
      * style
      
      * fix mbart...
      ba8c4d0a
  9. 12 Oct, 2020 1 commit
  10. 22 Sep, 2020 1 commit
    • Ola Piktus's avatar
      RAG (#6813) · c754c41c
      Ola Piktus authored
      * added rag WIP
      
      * path fix
      
      * Formatting / renaming prior to actual work
      
      * added rag WIP
      
      * path fix
      
      * Formatting / renaming prior to actual work
      
      * added rag WIP
      
      * path fix
      
      * Formatting / renaming prior to actual work
      
      * added rag WIP
      
      * Formatting / renaming prior to actual work
      
      * First commit
      
      * improve comments
      
      * Retrieval evaluation scripts
      
      * refactor to include modeling outputs + MPI retriever
      
      * Fix rag-token model + refactor
      
      * Various fixes + finetuning logic
      
      * use_bos fix
      
      * Retrieval refactor
      
      * Finetuning refactoring and cleanup
      
      * Add documentation and cleanup
      
      * Remove set_up_rag_env.sh file
      
      * Fix retrieval wit HF index
      
      * Fix import errors
      
      * Fix quality errors
      
      * Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867
      
      * fix quality
      
      * Fix RAG Sequence generation
      
      * minor cleanup plus initial tests
      
      * fix test
      
      * fix tests 2
      
      * Comments fix
      
      * post-merge...
      c754c41c
  11. 05 Jun, 2020 1 commit
  12. 04 Jun, 2020 1 commit
    • Julien Plu's avatar
      Tensorflow improvements (#4530) · f9414f75
      Julien Plu authored
      
      
      * Better None gradients handling
      
      * Apply Style
      
      * Apply Style
      
      * Create a loss class per task to compute its respective loss
      
      * Add loss classes to the ALBERT TF models
      
      * Add loss classes to the BERT TF models
      
      * Add question answering and multiple choice to TF Camembert
      
      * Remove prints
      
      * Add multiple choice model to TF DistilBERT + loss computation
      
      * Add question answering model to TF Electra + loss computation
      
      * Add token classification, question answering and multiple choice models to TF Flaubert
      
      * Add multiple choice model to TF Roberta + loss computation
      
      * Add multiple choice model to TF XLM + loss computation
      
      * Add multiple choice and question answering models to TF XLM-Roberta
      
      * Add multiple choice model to TF XLNet + loss computation
      
      * Remove unused parameters
      
      * Add task loss classes
      
      * Reorder TF imports + add new model classes
      
      * Add new model classes
      
      * Bugfix in TF T5 model
      
      * Bugfix for TF T5 tests
      
      * Bugfix in TF T5 model
      
      * Fix TF T5 model tests
      
      * Fix T5 tests + some renaming
      
      * Fix inheritance issue in the AutoX tests
      
      * Add tests for TF Flaubert and TF XLM Roberta
      
      * Add tests for TF Flaubert and TF XLM Roberta
      
      * Remove unused piece of code in the TF trainer
      
      * bugfix and remove unused code
      
      * Bugfix for TF 2.2
      
      * Apply Style
      
      * Divide TFSequenceClassificationAndMultipleChoiceLoss into their two respective name
      
      * Apply style
      
      * Mirror the PT Trainer in the TF one: fp16, optimizers and tb_writer as class parameter and better dataset handling
      
      * Fix TF optimizations tests and apply style
      
      * Remove useless parameter
      
      * Bugfix and apply style
      
      * Fix TF Trainer prediction
      
      * Now the TF models return the loss such as their PyTorch couterparts
      
      * Apply Style
      
      * Ignore some tests output
      
      * Take into account the SQuAD cls_index, p_mask and is_impossible parameters for the QuestionAnswering task models.
      
      * Fix names for SQuAD data
      
      * Apply Style
      
      * Fix conflicts with 2.11 release
      
      * Fix conflicts with 2.11
      
      * Fix wrongname
      
      * Add better documentation on the new create_optimizer function
      
      * Fix isort
      
      * logging_dir: use same default as PyTorch
      Co-authored-by: default avatarJulien Chaumond <chaumond@gmail.com>
      f9414f75
  13. 07 May, 2020 1 commit
    • Julien Chaumond's avatar
      BIG Reorganize examples (#4213) · 0ae96ff8
      Julien Chaumond authored
      * Created using Colaboratory
      
      * [examples] reorganize files
      
      * remove run_tpu_glue.py as superseded by TPU support in Trainer
      
      * Bugfix: int, not tuple
      
      * move files around
      0ae96ff8
  14. 05 May, 2020 1 commit
  15. 22 Apr, 2020 1 commit
    • Julien Chaumond's avatar
      Trainer (#3800) · dd9d483d
      Julien Chaumond authored
      * doc
      
      * [tests] Add sample files for a regression task
      
      * [HUGE] Trainer
      
      * Feedback from @sshleifer
      
      * Feedback from @thomwolf + logging tweak
      
      * [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes
      
      * [glue] Use default max_seq_length of 128 like before
      
      * [glue] move DataTrainingArguments around
      
      * [ner] Change interface of InputExample, and align run_{tf,pl}
      
      * Re-align the pl scripts a little bit
      
      * ner
      
      * [ner] Add integration test
      
      * Fix language_modeling with API tweak
      
      * [ci] Tweak loss target
      
      * Don't break console output
      
      * amp.initialize: model must be on right device before
      
      * [multiple-choice] update for Trainer
      
      * Re-align to 827d6d6e
      dd9d483d
  16. 23 Feb, 2020 1 commit
  17. 17 Feb, 2020 1 commit
  18. 06 Jan, 2020 2 commits
  19. 12 Nov, 2019 1 commit
  20. 09 Oct, 2019 1 commit
  21. 04 Oct, 2019 1 commit
    • keskarnitish's avatar
      Adding CTRL (squashed commit) · dbed1c5d
      keskarnitish authored
      adding conversion script
      
      adding first draft of modeling & tokenization
      
      adding placeholder for test files
      
      bunch of changes
      
      registering the tokenizer/model/etc
      
      tests
      
      change link; something is very VERY wrong here
      
      weird end-of-word thingy going on
      
      i think the tokenization works now ; wrote the unit tests
      
      overall structure works;load w next
      
      the monster is alive!
      
      works after some cleanup as well
      
      adding emacs autosave to gitignore
      
      currently only supporting the 48 layer one; seems to infer fine on my macbook
      
      cleanup
      
      fixing some documentation
      
      fixing some documentation
      
      tests passing?
      
      now works on CUDA also
      
      adding greedy?
      
      adding greedy sampling
      
      works well
      dbed1c5d
  22. 24 Sep, 2019 1 commit
  23. 05 Sep, 2019 1 commit
  24. 20 Aug, 2019 2 commits
  25. 09 Jul, 2019 1 commit
  26. 24 Jun, 2019 1 commit
  27. 20 Jun, 2019 1 commit
  28. 05 Feb, 2019 2 commits
  29. 15 Jan, 2019 1 commit
  30. 05 Nov, 2018 1 commit
  31. 31 Oct, 2018 1 commit
  32. 30 Oct, 2018 1 commit
  33. 29 Oct, 2018 1 commit