1. 05 Oct, 2020 2 commits
    • Sylvain Gugger's avatar
      Check and update model list in index.rst automatically (#7527) · b2b7fc78
      Sylvain Gugger authored
      * Check and update model list in index.rst automatically
      
      * Check and update model list in index.rst automatically
      
      * Adapt template
      b2b7fc78
    • Forrest Iandola's avatar
      SqueezeBERT architecture (#7083) · 02ef825b
      Forrest Iandola authored
      * configuration_squeezebert.py
      
      thin wrapper around bert tokenizer
      
      fix typos
      
      wip sb model code
      
      wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working
      
      set up squeezebert to use BertModelOutput when returning results.
      
      squeezebert documentation
      
      formatting
      
      allow head mask that is an array of [None, ..., None]
      
      docs
      
      docs cont'd
      
      path to vocab
      
      docs and pointers to cloud files (WIP)
      
      line length and indentation
      
      squeezebert model cards
      
      formatting of model cards
      
      untrack modeling_squeezebert_scratchpad.py
      
      update aws paths to vocab and config files
      
      get rid of stub of NSP code, and advise users to pretrain with mlm only
      
      fix rebase issues
      
      redo rebase of modeling_auto.py
      
      fix issues with code formatting
      
      more code format auto-fixes
      
      move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert
      
      tests for squeezebert modeling and tokenization
      
      fix typo
      
      move squeezebert before bert in modeling_auto.py to fix inheritance problem
      
      disable test_head_masking, since squeezebert doesn't yet implement head masking
      
      fix issues exposed by the test_modeling_squeezebert.py
      
      fix an issue exposed by test_tokenization_squeezebert.py
      
      fix issue exposed by test_modeling_squeezebert.py
      
      auto generated code style improvement
      
      issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()
      
      update copyright
      
      resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask
      
      docs
      
      add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli
      
      autogenerated formatting tweaks
      
      integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings
      
      * tiny change to order of imports
      02ef825b
  2. 24 Sep, 2020 1 commit
  3. 10 Sep, 2020 2 commits
  4. 04 Sep, 2020 1 commit
  5. 03 Sep, 2020 1 commit
  6. 02 Sep, 2020 1 commit
  7. 26 Aug, 2020 1 commit
  8. 24 Aug, 2020 1 commit
  9. 13 Aug, 2020 1 commit
    • Stas Bekman's avatar
      cleanup tf unittests: part 2 (#6260) · e983da0e
      Stas Bekman authored
      * cleanup torch unittests: part 2
      
      * remove trailing comma added by isort, and which breaks flake
      
      * one more comma
      
      * revert odd balls
      
      * part 3: odd cases
      
      * more ["key"] -> .key refactoring
      
      * .numpy() is not needed
      
      * more unncessary .numpy() removed
      
      * more simplification
      e983da0e
  10. 05 Aug, 2020 1 commit
    • Sylvain Gugger's avatar
      Tf model outputs (#6247) · c67d1a02
      Sylvain Gugger authored
      * TF outputs and test on BERT
      
      * Albert to DistilBert
      
      * All remaining TF models except T5
      
      * Documentation
      
      * One file forgotten
      
      * TF outputs and test on BERT
      
      * Albert to DistilBert
      
      * All remaining TF models except T5
      
      * Documentation
      
      * One file forgotten
      
      * Add new models and fix issues
      
      * Quality improvements
      
      * Add T5
      
      * A bit of cleanup
      
      * Fix for slow tests
      
      * Style
      c67d1a02
  11. 04 Aug, 2020 1 commit
  12. 03 Aug, 2020 1 commit
  13. 31 Jul, 2020 1 commit
  14. 30 Jul, 2020 1 commit
    • Sylvain Gugger's avatar
      Switch from return_tuple to return_dict (#6138) · 91cb9546
      Sylvain Gugger authored
      
      
      * Switch from return_tuple to return_dict
      
      * Fix test
      
      * [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)
      
      * Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests
      
      * AutoModels
      
      
      Tiny tweaks
      
      * Style
      
      * Final changes before merge
      
      * Re-order for simpler review
      
      * Final fixes
      
      * Addressing @sgugger's comments
      
      * Test MultipleChoice
      
      * Rework TF trainer (#6038)
      
      * Fully rework training/prediction loops
      
      * fix method name
      
      * Fix variable name
      
      * Fix property name
      
      * Fix scope
      
      * Fix method name
      
      * Fix tuple index
      
      * Fix tuple index
      
      * Fix indentation
      
      * Fix variable name
      
      * fix eval before log
      
      * Add drop remainder for test dataset
      
      * Fix step number + fix logging datetime
      
      * fix eval loss value
      
      * use global step instead of step + fix logging at step 0
      
      * Fix logging datetime
      
      * Fix global_step usage
      
      * Fix breaking loop + logging datetime
      
      * Fix step in prediction loop
      
      * Fix step breaking
      
      * Fix train/test loops
      
      * Force TF at least 2.2 for the trainer
      
      * Use assert_cardinality to facilitate the dataset size computation
      
      * Log steps per epoch
      
      * Make tfds compliant with TPU
      
      * Make tfds compliant with TPU
      
      * Use TF dataset enumerate instead of the Python one
      
      * revert previous commit
      
      * Fix data_dir
      
      * Apply style
      
      * rebase on master
      
      * Address Sylvain's comments
      
      * Address Sylvain's and Lysandre comments
      
      * Trigger CI
      
      * Remove unused import
      
      * Switch from return_tuple to return_dict
      
      * Fix test
      
      * Add recent model
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarJulien Plu <plu.julien@gmail.com>
      91cb9546
  15. 24 Jul, 2020 1 commit
  16. 22 Jul, 2020 1 commit
  17. 26 Jun, 2020 1 commit
  18. 15 Jun, 2020 1 commit
    • Anthony MOI's avatar
      [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized... · 36434220
      Anthony MOI authored
      
      [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
      
      * Use tokenizers pre-tokenized pipeline
      
      * failing pretrokenized test
      
      * Fix is_pretokenized in python
      
      * add pretokenized tests
      
      * style and quality
      
      * better tests for batched pretokenized inputs
      
      * tokenizers clean up - new padding_strategy - split the files
      
      * [HUGE] refactoring tokenizers - padding - truncation - tests
      
      * style and quality
      
      * bump up requied tokenizers version to 0.8.0-rc1
      
      * switched padding/truncation API - simpler better backward compat
      
      * updating tests for custom tokenizers
      
      * style and quality - tests on pad
      
      * fix QA pipeline
      
      * fix backward compatibility for max_length only
      
      * style and quality
      
      * Various cleans up - add verbose
      
      * fix tests
      
      * update docstrings
      
      * Fix tests
      
      * Docs reformatted
      
      * __call__ method documented
      Co-authored-by: default avatarThomas Wolf <thomwolf@users.noreply.github.com>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      36434220
  19. 09 Jun, 2020 1 commit
    • Bharat Raghunathan's avatar
      [All models] Extend config.output_attentions with output_attentions function arguments (#4538) · 6e603cb7
      Bharat Raghunathan authored
      
      
      * DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``
      
      * DOC: Apply Black Formatting
      
      * Fix errors where output_attentions was undefined
      
      * Remove output_attentions in classes per review
      
      * Fix regressions on tests having `output_attention`
      
      * Fix further regressions in tests relating to `output_attentions`
      
      Ensure proper propagation of `output_attentions` as a function parameter
      to all model subclasses
      
      * Fix more regressions in `test_output_attentions`
      
      * Fix issues with BertEncoder
      
      * Rename related variables to `output_attentions`
      
      * fix pytorch tests
      
      * fix bert and gpt2 tf
      
      * Fix most TF tests for `test_output_attentions`
      
      * Fix linter errors and more TF tests
      
      * fix conflicts
      
      * DOC: Apply Black Formatting
      
      * Fix errors where output_attentions was undefined
      
      * Remove output_attentions in classes per review
      
      * Fix regressions on tests having `output_attention`
      
      * fix conflicts
      
      * fix conflicts
      
      * fix conflicts
      
      * fix conflicts
      
      * fix pytorch tests
      
      * fix conflicts
      
      * fix conflicts
      
      * Fix linter errors and more TF tests
      
      * fix tf tests
      
      * make style
      
      * fix isort
      
      * improve output_attentions
      
      * improve tensorflow
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      6e603cb7
  20. 02 Jun, 2020 1 commit
    • Julien Chaumond's avatar
      Kill model archive maps (#4636) · d4c2cb40
      Julien Chaumond authored
      * Kill model archive maps
      
      * Fixup
      
      * Also kill model_archive_map for MaskedBertPreTrainedModel
      
      * Unhook config_archive_map
      
      * Tokenizers: align with model id changes
      
      * make style && make quality
      
      * Fix CI
      d4c2cb40
  21. 29 Apr, 2020 1 commit
    • Julien Chaumond's avatar
      CDN urls (#4030) · 455c6390
      Julien Chaumond authored
      * [file_utils] use_cdn + documentation
      
      * Move to cdn. urls for weights
      
      * [urls] Hotfix for bert-base-japanese
      455c6390
  22. 18 Apr, 2020 1 commit
    • Thomas Wolf's avatar
      Cleanup fast tokenizers integration (#3706) · 827d6d6e
      Thomas Wolf authored
      
      
      * First pass on utility classes and python tokenizers
      
      * finishing cleanup pass
      
      * style and quality
      
      * Fix tests
      
      * Updating following @mfuntowicz comment
      
      * style and quality
      
      * Fix Roberta
      
      * fix batch_size/seq_length inBatchEncoding
      
      * add alignement methods + tests
      
      * Fix OpenAI and Transfo-XL tokenizers
      
      * adding trim_offsets=True default for GPT2 et RoBERTa
      
      * style and quality
      
      * fix tests
      
      * add_prefix_space in roberta
      
      * bump up tokenizers to rc7
      
      * style
      
      * unfortunately tensorfow does like these - removing shape/seq_len for now
      
      * Update src/transformers/tokenization_utils.py
      Co-Authored-By: default avatarStefan Schweter <stefan@schweter.it>
      
      * Adding doc and docstrings
      
      * making flake8 happy
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      827d6d6e
  23. 16 Apr, 2020 1 commit
  24. 08 Apr, 2020 1 commit
  25. 04 Apr, 2020 1 commit
  26. 07 Feb, 2020 1 commit
  27. 15 Jan, 2020 1 commit
  28. 13 Jan, 2020 1 commit
  29. 07 Jan, 2020 1 commit
  30. 06 Jan, 2020 2 commits
  31. 05 Jan, 2020 1 commit
  32. 28 Dec, 2019 1 commit
  33. 22 Dec, 2019 5 commits