1. 17 Nov, 2020 2 commits
    • Patrick von Platen's avatar
      T5 & mT5 (#8552) · 86822a35
      Patrick von Platen authored
      * add mt5 and t5v1_1 model
      
      * fix tests
      
      * correct some imports
      
      * add tf model
      
      * finish tf t5
      
      * improve examples
      
      * fix copies
      
      * clean doc
      86822a35
    • Sylvain Gugger's avatar
      Reorganize repo (#8580) · c89bdfbe
      Sylvain Gugger authored
      * Put models in subfolders
      
      * Styling
      
      * Fix imports in tests
      
      * More fixes in test imports
      
      * Sneaky hidden imports
      
      * Fix imports in doc files
      
      * More sneaky imports
      
      * Finish fixing tests
      
      * Fix examples
      
      * Fix path for copies
      
      * More fixes for examples
      
      * Fix dummy files
      
      * More fixes for example
      
      * More model import fixes
      
      * Is this why you're unhappy GitHub?
      
      * Fix imports in conver command
      c89bdfbe
  2. 16 Nov, 2020 3 commits
    • Sylvain Gugger's avatar
      Switch `return_dict` to `True` by default. (#8530) · 1073a2bd
      Sylvain Gugger authored
      * Use the CI to identify failing tests
      
      * Remove from all examples and tests
      
      * More default switch
      
      * Fixes
      
      * More test fixes
      
      * More fixes
      
      * Last fixes hopefully
      
      * Use the CI to identify failing tests
      
      * Remove from all examples and tests
      
      * More default switch
      
      * Fixes
      
      * More test fixes
      
      * More fixes
      
      * Last fixes hopefully
      
      * Run on the real suite
      
      * Fix slow tests
      1073a2bd
    • LSinev's avatar
      Fix GPT2DoubleHeadsModel to work with model.generate() (#6601) · afb50c66
      LSinev authored
      * Fix passing token_type_ids during GPT2DoubleHeadsModel.generate() if used
      
      and for GPT2LMHeadModel too
      
      * Update tests to check token_type_ids usage in GPT2 models
      afb50c66
    • Yusuke Mori's avatar
      Adding the prepare_seq2seq_batch function to ProphetNet (#8515) · 04d8136b
      Yusuke Mori authored
      * Simply insert T5Tokenizer's prepare_seq2seq_batch
      
      * Update/Add some 'import'
      
      * fix RunTimeError caused by '.view'
      
      * Moves .view related error avoidance from seq2seq_trainer to inside prophetnet
      
      * Update test_tokenization_prophetnet.py
      
      * Format the test code with black
      
      * Re-format the test code
      
      * Update test_tokenization_prophetnet.py
      
      * Add importing require_torch in the test code
      
      * Add importing BatchEncoding in the test code
      
      * Re-format the test code on Colab
      04d8136b
  3. 15 Nov, 2020 1 commit
    • Thomas Wolf's avatar
      [breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests... · f4e04cd2
      Thomas Wolf authored
      
      [breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency (#8073)
      
      * Fixing roberta for slow-fast tests
      
      * WIP getting equivalence on pipelines
      
      * slow-to-fast equivalence - working on question-answering pipeline
      
      * optional FAISS tests
      
      * Pipeline Q&A
      
      * Move pipeline tests to their own test job again
      
      * update tokenizer to add sequence id methods
      
      * update to tokenizers 0.9.4
      
      * set sentencepiecce as optional
      
      * clean up squad
      
      * clean up pipelines to use sequence_ids
      
      * style/quality
      
      * wording
      
      * Switch to use_fast = True by default
      
      * update tests for use_fast at True by default
      
      * fix rag tokenizer test
      
      * removing protobuf from required dependencies
      
      * fix NER test for use_fast = True by default
      
      * fixing example tests (Q&A examples use slow tokenizers for now)
      
      * protobuf in main deps extras["sentencepiece"] and example deps
      
      * fix protobug install test
      
      * try to fix seq2seq by switching to slow tokenizers for now
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      f4e04cd2
  4. 13 Nov, 2020 2 commits
  5. 12 Nov, 2020 1 commit
  6. 11 Nov, 2020 3 commits
    • Lysandre's avatar
      Skip test until investigation · c7b6bbec
      Lysandre authored
      c7b6bbec
    • Ratthachat (Jung)'s avatar
      Add TFDPR (#8203) · 026a2ff2
      Ratthachat (Jung) authored
      * Create modeling_tf_dpr.py
      
      * Add TFDPR
      
      * Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
      
      last commit accidentally deleted these 4 lines, so I recover them back
      
      * Add TFDPR
      
      * Add TFDPR
      
      * clean up some comments, add TF input-style doc string
      
      * Add TFDPR
      
      * Make return_dict=False as default
      
      * Fix return_dict bug (in .from_pretrained)
      
      * Add get_input_embeddings()
      
      * Create test_modeling_tf_dpr.py
      
      The current version is already passed all 27 tests!
      Please see the test run at : 
      https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
      
      
      
      * fix quality
      
      * delete init weights
      
      * run fix copies
      
      * fix repo consis
      
      * del config_class, load_tf_weights
      
      They shoud be 'pytorch only'
      
      * add config_class back
      
      after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
      
      * newline after .. note::
      
      * import tf, np (Necessary for ModelIntegrationTest)
      
      * slow_test from_pretrained with from_pt=True
      
      At the moment we don't have TF weights (since we don't have official official TF model)
      Previously, I did not run slow test, so I missed this bug
      
      * Add simple TFDPRModelIntegrationTest
      
      Note that this is just a test that TF and Pytorch gives approx. the same output.
      However, I could not test with the official DPR repo's output yet
      
      * upload correct tf model
      
      * remove position_ids as missing keys
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarpatrickvonplaten <patrick@huggingface.co>
      026a2ff2
    • Julien Plu's avatar
      Add next sentence prediction loss computation (#8462) · da842e4e
      Julien Plu authored
      * Add next sentence prediction loss computation
      
      * Apply style
      
      * Fix tests
      
      * Add forgotten import
      
      * Add forgotten import
      
      * Use a new parameter
      
      * Remove kwargs and use positional arguments
      da842e4e
  7. 10 Nov, 2020 7 commits
  8. 09 Nov, 2020 3 commits
  9. 06 Nov, 2020 1 commit
  10. 05 Nov, 2020 2 commits
  11. 04 Nov, 2020 3 commits
  12. 03 Nov, 2020 7 commits
    • Ceyda Cinarel's avatar
      [WIP] Ner pipeline grouped_entities fixes (#5970) · 29b536a7
      Ceyda Cinarel authored
      
      
      * Bug fix: NER pipeline shouldn't group separate entities of same type
      
      * style fix
      
      * [Bug Fix] Shouldn't group entities that are both 'B' even if they are same type
      	(B-type1 B-type1) != (B-type1 I-type1)
      [Bug Fix] add an option `ignore_subwords` to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time.
      	The simplest fix is to just group the subwords with the first wordpiece.
      	[TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ?
      	[TODO] handle different wordpiece_prefix ## ? possible approaches:
      		get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property?
      		have an _is_subword(token)
      [Feature add] added option to `skip_special_tokens`. Cause It was harder to remove them after grouping.
      [Additional Changes] remove B/I prefix on returned grouped_entities
      [Feature Request/TODO] Return indexes?
      [Bug TODO]  can't use fast tokenizer with grouped_entities ('BertTokenizerFast' object has no attribute 'convert_tokens_to_string')
      
      * use offset_mapping to fix [UNK] token problem
      
      * ignore score for subwords
      
      * modify ner_pipeline test
      
      * modify ner_pipeline test
      
      * modify ner_pipeline test
      
      * ner_pipeline change ignore_subwords default to true
      
      * add ner_pipeline ignore_subword=False test case
      
      * fix offset_mapping index
      
      * fix style again duh
      
      * change is_subword and convert_tokens_to_string logic
      
      * merge tests with new test structure
      
      * change test names
      
      * remove old tests
      
      * ner tests for fast tokenizer
      
      * fast tokenizers have convert_tokens_to_string
      
      * Fix the incorrect merge
      Co-authored-by: default avatarCeyda Cinarel <snu-ceyda@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      29b536a7
    • Stas Bekman's avatar
      [CIs] Better reports everywhere (#8275) · 1bb4bba5
      Stas Bekman authored
      * make it possible to invoke testconf.py in both test suites without crashing on having the same option added
      
      * perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts
      
      * add `pytest --make-reports` to all CIs (and artifacts)
      
      * fix
      1bb4bba5
    • Sylvain Gugger's avatar
      Data collator for token classification (#8274) · 7f556d2e
      Sylvain Gugger authored
      * Add DataCollatorForTokenClassification and clean tests
      
      * Make quality
      7f556d2e
    • Sylvain Gugger's avatar
      4c19f3ba
    • guillaume-be's avatar
      Updated ConversationalPipeline to work with encoder-decoder models (#8207) · 74f6f91a
      guillaume-be authored
      
      
      * Updated ConversationalPipeline to work with encoder-decoder models (e.g. BlenderBot)
      
      * Addition of integration test for EncoderDecoder conversation model
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      74f6f91a
    • Nicolas Patry's avatar
      [FIX] TextGenerationPipeline is currently broken. (#8256) · c66ffa3a
      Nicolas Patry authored
      * [FIX] TextGenerationPipeline is currently broken.
      
      It's most likely due to #8180.
      What's missing is a multi vs single string handler at the beginning of
      the pipe.
      And also there was no testing of this pipeline.
      
      * Fixing Conversational tests too.
      c66ffa3a
    • Patrick von Platen's avatar
      Refactoring the generate() function (#6949) · a1bbcf3f
      Patrick von Platen authored
      * first draft
      
      * show design proposition for new generate method
      
      * up
      
      * make better readable
      
      * make first version
      
      * gpt2 tests pass
      
      * make beam search for gpt2 work
      
      * add first encoder-decoder code
      
      * delete typo
      
      * make t5 work
      
      * save indermediate
      
      * make bart work with beam search
      
      * finish beam search bart / t5
      
      * add default kwargs
      
      * make more tests pass
      
      * fix no bad words sampler
      
      * some fixes and tests for all distribution processors
      
      * fix test
      
      * fix rag slow tests
      
      * merge to master
      
      * add nograd to generate
      
      * make all slow tests pass
      
      * speed up generate
      
      * fix edge case bug
      
      * small fix
      
      * correct typo
      
      * add type hints and docstrings
      
      * fix typos in tests
      
      * add beam search tests
      
      * add tests for beam scorer
      
      * fix test rag
      
      * finish beam search tests
      
      * move generation tests in seperate file
      
      * fix generation tests
      
      * more tests
      
      * add aggressive generation tests
      
      * fix tests
      
      * add gpt2 sample test
      
      * add more docstring
      
      * add more docs
      
      * finish doc strings
      
      * apply some more of sylvains and sams comments
      
      * fix some typos
      
      * make fix copies
      
      * apply lysandres and sylvains comments
      
      * final corrections on examples
      
      * small fix for reformer
      a1bbcf3f
  13. 02 Nov, 2020 3 commits
  14. 30 Oct, 2020 2 commits
    • TFUsers's avatar
      Replace swish with silu (#8166) · 00112c35
      TFUsers authored
      
      
      * Replace swish with silu
      
      * revert nn.silu to nn.swish due to older version
      
      * simplify optimized silu conditional and fix format
      
      * Update activations.py
      
      * Update activations_tf.py
      
      * Update modeling_flax_utils.py
      
      * Update modeling_openai.py
      
      * add swish testcase
      
      * add pytorch swish testcase
      
      * Add more robust python version check
      
      * more formatting fixes
      Co-authored-by: default avatarTFUsers <TFUsers@gmail.com>
      00112c35
    • Sam Shleifer's avatar
      TFMarian, TFMbart, TFPegasus, TFBlenderbot (#7987) · 566b083e
      Sam Shleifer authored
      
      
      * Start plumbing
      
      * Marian close
      
      * Small stubs for all children
      
      * Fixed bart
      
      * marian working
      
      * pegasus test is good, but failing
      
      * Checkin tests
      
      * More model files
      
      * Subtle marian, pegasus integration test failures
      
      * Works well
      
      * rm print
      
      * boom boom
      
      * Still failing model2doc
      
      * merge master
      
      * Equivalence test failing, all others fixed
      
      * cleanup
      
      * Fix embed_scale
      
      * Cleanup marian pipeline test
      
      * Undo extra changes
      
      * Smaller delta
      
      * Cleanup model testers
      
      * undo delta
      
      * fix tests import structure
      
      * cross test decorator
      
      * Cleaner set_weights
      
      * Respect authorized_unexpected_keys
      
      * No warnings
      
      * No warnings
      
      * style
      
      * Nest tf import
      
      * black
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * functional dropout
      
      * fixup
      
      * Fixup
      
      * style_doc
      
      * embs
      
      * shape list
      
      * delete slow force_token_id_to_be_generated func
      
      * fixup
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      566b083e