1. 19 Nov, 2020 5 commits
    • Stas Bekman's avatar
      [tokenizers] convert_to_tensors: don't reconvert when the type is already right (#8283) · 42111f1d
      Stas Bekman authored
      * don't reconvert when the type is already right
      
      * better name
      
      * adjust logic as suggested
      
      * merge
      42111f1d
    • Zhylko Dima's avatar
      `disable_ngram_loss` fix for prophetnet (#8554) · ca0109bd
      Zhylko Dima authored
      
      
      * `disable_ngram_loss` fix for prophetnet
      
      * add changes documentation
      
      * fix _compute_loss to use mean reduction and -100 to masked tokens & remove unnecessary arguments
      
      * mean label smoothing loss
      
      * small refactor
      
      * fix test
      Co-authored-by: default avatarpatrickvonplaten <patrick.v.platen@gmail.com>
      ca0109bd
    • Sylvain Gugger's avatar
      Better filtering of the model outputs in Trainer (#8633) · 4208f496
      Sylvain Gugger authored
      * Better filtering of the model outputs in Trainer
      
      * Fix examples tests
      
      * Add test for Lysandre
      4208f496
    • Lysandre Debut's avatar
      Fix a bunch of slow tests (#8634) · f2e07e72
      Lysandre Debut authored
      
      
      * CI should install `sentencepiece`
      
      * Requiring TF
      
      * Fixing some TFDPR bugs
      
      * remove return_dict=False/True hack
      Co-authored-by: default avatarpatrickvonplaten <patrick.v.platen@gmail.com>
      f2e07e72
    • elk-cloner's avatar
      Tf longformer for sequence classification (#8231) · 5362bb8a
      elk-cloner authored
      
      
      * working on LongformerForSequenceClassification
      
      * add TFLongformerForMultipleChoice
      
      * add TFLongformerForTokenClassification
      
      * use add_start_docstrings_to_model_forward
      
      * test TFLongformerForSequenceClassification
      
      * test TFLongformerForMultipleChoice
      
      * test TFLongformerForTokenClassification
      
      * remove test from repo
      
      * add test and doc for TFLongformerForSequenceClassification, TFLongformerForTokenClassification, TFLongformerForMultipleChoice
      
      * add requested classes to modeling_tf_auto.py
      update dummy_tf_objects
      fix tests
      fix bugs in requested classes
      
      * pass all tests except test_inputs_embeds
      
      * sync with master
      
      * pass all tests except test_inputs_embeds
      
      * pass all tests
      
      * pass all tests
      
      * work on test_inputs_embeds
      
      * fix style and quality
      
      * make multi choice work
      
      * fix TFLongformerForTokenClassification signature
      
      * fix TFLongformerForMultipleChoice, TFLongformerForSequenceClassification signature
      
      * fix mult choice
      
      * fix mc hint
      
      * fix input embeds
      
      * fix input embeds
      
      * refactor input embeds
      
      * fix copy issue
      
      * apply sylvains changes and clean more
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      5362bb8a
  2. 18 Nov, 2020 2 commits
  3. 17 Nov, 2020 4 commits
  4. 16 Nov, 2020 3 commits
    • Sylvain Gugger's avatar
      Switch `return_dict` to `True` by default. (#8530) · 1073a2bd
      Sylvain Gugger authored
      * Use the CI to identify failing tests
      
      * Remove from all examples and tests
      
      * More default switch
      
      * Fixes
      
      * More test fixes
      
      * More fixes
      
      * Last fixes hopefully
      
      * Use the CI to identify failing tests
      
      * Remove from all examples and tests
      
      * More default switch
      
      * Fixes
      
      * More test fixes
      
      * More fixes
      
      * Last fixes hopefully
      
      * Run on the real suite
      
      * Fix slow tests
      1073a2bd
    • LSinev's avatar
      Fix GPT2DoubleHeadsModel to work with model.generate() (#6601) · afb50c66
      LSinev authored
      * Fix passing token_type_ids during GPT2DoubleHeadsModel.generate() if used
      
      and for GPT2LMHeadModel too
      
      * Update tests to check token_type_ids usage in GPT2 models
      afb50c66
    • Yusuke Mori's avatar
      Adding the prepare_seq2seq_batch function to ProphetNet (#8515) · 04d8136b
      Yusuke Mori authored
      * Simply insert T5Tokenizer's prepare_seq2seq_batch
      
      * Update/Add some 'import'
      
      * fix RunTimeError caused by '.view'
      
      * Moves .view related error avoidance from seq2seq_trainer to inside prophetnet
      
      * Update test_tokenization_prophetnet.py
      
      * Format the test code with black
      
      * Re-format the test code
      
      * Update test_tokenization_prophetnet.py
      
      * Add importing require_torch in the test code
      
      * Add importing BatchEncoding in the test code
      
      * Re-format the test code on Colab
      04d8136b
  5. 15 Nov, 2020 1 commit
    • Thomas Wolf's avatar
      [breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests... · f4e04cd2
      Thomas Wolf authored
      
      [breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency (#8073)
      
      * Fixing roberta for slow-fast tests
      
      * WIP getting equivalence on pipelines
      
      * slow-to-fast equivalence - working on question-answering pipeline
      
      * optional FAISS tests
      
      * Pipeline Q&A
      
      * Move pipeline tests to their own test job again
      
      * update tokenizer to add sequence id methods
      
      * update to tokenizers 0.9.4
      
      * set sentencepiecce as optional
      
      * clean up squad
      
      * clean up pipelines to use sequence_ids
      
      * style/quality
      
      * wording
      
      * Switch to use_fast = True by default
      
      * update tests for use_fast at True by default
      
      * fix rag tokenizer test
      
      * removing protobuf from required dependencies
      
      * fix NER test for use_fast = True by default
      
      * fixing example tests (Q&A examples use slow tokenizers for now)
      
      * protobuf in main deps extras["sentencepiece"] and example deps
      
      * fix protobug install test
      
      * try to fix seq2seq by switching to slow tokenizers for now
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      f4e04cd2
  6. 13 Nov, 2020 2 commits
  7. 12 Nov, 2020 1 commit
  8. 11 Nov, 2020 3 commits
    • Lysandre's avatar
      Skip test until investigation · c7b6bbec
      Lysandre authored
      c7b6bbec
    • Ratthachat (Jung)'s avatar
      Add TFDPR (#8203) · 026a2ff2
      Ratthachat (Jung) authored
      * Create modeling_tf_dpr.py
      
      * Add TFDPR
      
      * Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
      
      last commit accidentally deleted these 4 lines, so I recover them back
      
      * Add TFDPR
      
      * Add TFDPR
      
      * clean up some comments, add TF input-style doc string
      
      * Add TFDPR
      
      * Make return_dict=False as default
      
      * Fix return_dict bug (in .from_pretrained)
      
      * Add get_input_embeddings()
      
      * Create test_modeling_tf_dpr.py
      
      The current version is already passed all 27 tests!
      Please see the test run at : 
      https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
      
      
      
      * fix quality
      
      * delete init weights
      
      * run fix copies
      
      * fix repo consis
      
      * del config_class, load_tf_weights
      
      They shoud be 'pytorch only'
      
      * add config_class back
      
      after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
      
      * newline after .. note::
      
      * import tf, np (Necessary for ModelIntegrationTest)
      
      * slow_test from_pretrained with from_pt=True
      
      At the moment we don't have TF weights (since we don't have official official TF model)
      Previously, I did not run slow test, so I missed this bug
      
      * Add simple TFDPRModelIntegrationTest
      
      Note that this is just a test that TF and Pytorch gives approx. the same output.
      However, I could not test with the official DPR repo's output yet
      
      * upload correct tf model
      
      * remove position_ids as missing keys
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarpatrickvonplaten <patrick@huggingface.co>
      026a2ff2
    • Julien Plu's avatar
      Add next sentence prediction loss computation (#8462) · da842e4e
      Julien Plu authored
      * Add next sentence prediction loss computation
      
      * Apply style
      
      * Fix tests
      
      * Add forgotten import
      
      * Add forgotten import
      
      * Use a new parameter
      
      * Remove kwargs and use positional arguments
      da842e4e
  9. 10 Nov, 2020 7 commits
  10. 09 Nov, 2020 3 commits
  11. 06 Nov, 2020 1 commit
  12. 05 Nov, 2020 2 commits
  13. 04 Nov, 2020 3 commits
  14. 03 Nov, 2020 3 commits
    • Ceyda Cinarel's avatar
      [WIP] Ner pipeline grouped_entities fixes (#5970) · 29b536a7
      Ceyda Cinarel authored
      
      
      * Bug fix: NER pipeline shouldn't group separate entities of same type
      
      * style fix
      
      * [Bug Fix] Shouldn't group entities that are both 'B' even if they are same type
      	(B-type1 B-type1) != (B-type1 I-type1)
      [Bug Fix] add an option `ignore_subwords` to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time.
      	The simplest fix is to just group the subwords with the first wordpiece.
      	[TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ?
      	[TODO] handle different wordpiece_prefix ## ? possible approaches:
      		get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property?
      		have an _is_subword(token)
      [Feature add] added option to `skip_special_tokens`. Cause It was harder to remove them after grouping.
      [Additional Changes] remove B/I prefix on returned grouped_entities
      [Feature Request/TODO] Return indexes?
      [Bug TODO]  can't use fast tokenizer with grouped_entities ('BertTokenizerFast' object has no attribute 'convert_tokens_to_string')
      
      * use offset_mapping to fix [UNK] token problem
      
      * ignore score for subwords
      
      * modify ner_pipeline test
      
      * modify ner_pipeline test
      
      * modify ner_pipeline test
      
      * ner_pipeline change ignore_subwords default to true
      
      * add ner_pipeline ignore_subword=False test case
      
      * fix offset_mapping index
      
      * fix style again duh
      
      * change is_subword and convert_tokens_to_string logic
      
      * merge tests with new test structure
      
      * change test names
      
      * remove old tests
      
      * ner tests for fast tokenizer
      
      * fast tokenizers have convert_tokens_to_string
      
      * Fix the incorrect merge
      Co-authored-by: default avatarCeyda Cinarel <snu-ceyda@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      29b536a7
    • Stas Bekman's avatar
      [CIs] Better reports everywhere (#8275) · 1bb4bba5
      Stas Bekman authored
      * make it possible to invoke testconf.py in both test suites without crashing on having the same option added
      
      * perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts
      
      * add `pytest --make-reports` to all CIs (and artifacts)
      
      * fix
      1bb4bba5
    • Sylvain Gugger's avatar
      Data collator for token classification (#8274) · 7f556d2e
      Sylvain Gugger authored
      * Add DataCollatorForTokenClassification and clean tests
      
      * Make quality
      7f556d2e