1. 10 Feb, 2021 2 commits
  2. 09 Feb, 2021 3 commits
  3. 08 Feb, 2021 9 commits
    • sandip's avatar
      Integration test for electra model (#10073) · 263fac71
      sandip authored
      263fac71
    • demSd's avatar
      Implementing the test integration of BertGeneration (#9990) · 3b7e612a
      demSd authored
      * claiming this issue
      
      * Integration test for BertGeneration(Encoder and Decoder)
      
      * fix code quality
      3b7e612a
    • Patrick von Platen's avatar
      fix bert2bert test (#10063) · 9e795eac
      Patrick von Platen authored
      9e795eac
    • Julien Plu's avatar
      Restore TF embeddings and attention layers to their previous version (#9890) · 31563e05
      Julien Plu authored
      * Refacto BERT
      
      * Restore all the concerned models
      
      * Remove print
      
      * Update template
      
      * Apply Sylvain's and Morgan's comments
      
      * Fix cast
      
      * Put the cast inside call
      
      * Remove cond in ebds
      
      * Fix funnel
      
      * Restore previous dot product (attention_scores) computation
      
      * Add ConvBERT and BART
      
      * Make all the S2S models ONNX compliant
      
      * Fix test
      
      * Fix check copies
      31563e05
    • Julien Plu's avatar
      Disable temporarily too slow tests (Longformer/LED) (#10062) · 8bb52bd2
      Julien Plu authored
      * Disable temporarily too slow tests
      
      * Fix style
      
      * Fix template
      8bb52bd2
    • Nicolas Patry's avatar
      Cleaning up `ConversationalPipeline` to support more than DialoGPT. (#10002) · b1aa4982
      Nicolas Patry authored
      * Cleaning up `ConversationalPipeline` to support more than DialoGPT.
      
      Currently ConversationalPipeline was heavily biased towards DialoGPT
      ,which is the default model for this pipeline.
      
      This PR proposes changes to put back the modifications specific to
      DialoGPT into tokenizer-specific behavior wherever possible, by
      creating `_build_conversation_input_ids` function that takes
      conversation as input, and returns a list of ints corresponding
      to the tokens. It feels natural to put here because all models
      have probably different strategies to build input_ids from the
      full conversation and it's the tokenizer's job to transform strings
      into tokens (and vice-versa)
      
      If `_build_conversation_input_ids` is missing, previous behavior is
      used so we don't break anything so far (except for blenderbot where it's a fix).
      
      This PR also contains a fix for too long inputs. There used
      to be dead code for trying to limit the size of incoming input.
      The introduced fixed is that we limit
      within `_build_conversation_input_ids` to `tokenizer.model_max_length`.
      It corresponds to the intent of the removed dead code and is actually
      better because it corresponds to `model_max_length` which is different
      from `max_length` (which is a default parameter for `generate`).
      
      - Removed `history` logic from the Conversation as it's not relevant
      anymore because tokenization logic has been moved to tokenizer.
      And tokenizer cannot save any cache, and conversation cannot know
      what is relevant or not.
      Also it's not usable from `blenderbot` because the input_ids are
      not append only (EOS tokens is always at the end).
      
      - Added `iter_texts` method on `Conversation` because all
      the code was literred with some form of this iteration of
      past/generated_responses.
      
      * Removing torch mention in types.
      
      * Adding type checking to `_build_conversation_input_ids`.
      
      * Fixing import in strings.
      b1aa4982
    • Patrick von Platen's avatar
      fix bart tests (#10060) · 9a0399e1
      Patrick von Platen authored
      9a0399e1
    • Lysandre Debut's avatar
      Fix slow dpr test (#10059) · d51302cc
      Lysandre Debut authored
      * Correct cast to device
      
      * Comment back the slow test
      d51302cc
    • sandip's avatar
      Integration test for FlauBert (#10022) · 12e44af5
      sandip authored
      12e44af5
  4. 04 Feb, 2021 4 commits
  5. 03 Feb, 2021 6 commits
  6. 02 Feb, 2021 4 commits
    • Daniel Stancl's avatar
      Add head_mask and decoder_head_mask to PyTorch LED (#9856) · 71bdc076
      Daniel Stancl authored
      * Add {decoder_,}head_mask to LED
      
      * Fix create_custom_forward signatue in encoder
      
      * Add head_mask to longformer
      
      * Add head_mask to longformer to fix dependencies
      of LED on Longformer.
      
      * Not working yet
      
      * Add mising one input in longofrmer_modeling.py
      
      * make fix-copies
      71bdc076
    • Patrick von Platen's avatar
      Wav2Vec2 (#9659) · d6217fb3
      Patrick von Platen authored
      
      
      * add raw scaffold
      
      * implement feat extract layers
      
      * make style
      
      * remove +
      
      * correctly convert weights
      
      * make feat extractor work
      
      * make feature extraction proj work
      
      * run forward pass
      
      * finish forward pass
      
      * Succesful decoding example
      
      * remove unused files
      
      * more changes
      
      * add wav2vec tokenizer
      
      * add new structure
      
      * fix run forward
      
      * add other layer norm architecture
      
      * finish 2nd structure
      
      * add model tests
      
      * finish tests for tok and model
      
      * clean-up
      
      * make style
      
      * finish docstring for model and config
      
      * make style
      
      * correct docstring
      
      * correct tests
      
      * change checkpoints to fairseq
      
      * fix examples
      
      * finish wav2vec2
      
      * make style
      
      * apply sylvains suggestions
      
      * apply lysandres suggestions
      
      * change print to log.info
      
      * re-add assert statement
      
      * add input_values as required input name
      
      * finish wav2vec2 tokenizer
      
      * Update tests/test_tokenization_wav2vec2.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * apply sylvains suggestions
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      d6217fb3
    • Lysandre Debut's avatar
      ALBERT Tokenizer integration test (#9943) · 1809de51
      Lysandre Debut authored
      * ALBERT Tokenizer integration test
      
      * Batching
      
      * Style
      1809de51
    • Patrick von Platen's avatar
      [Tokenizer Utils Base] Make pad function more flexible (#9928) · 538b3b46
      Patrick von Platen authored
      * change tokenizer requirement
      
      * split line
      
      * Correct typo from list to str
      
      * improve style
      
      * make other function pretty as well
      
      * add comment
      
      * correct typo
      
      * add new test
      
      * pass tests for tok without padding token
      
      * Apply suggestions from code review
      538b3b46
  7. 01 Feb, 2021 1 commit
    • Daniel Stancl's avatar
      Add head_mask and decoder_head_mask to FSMT (#9819) · 0c6c0afc
      Daniel Stancl authored
      * Add {decoder_,}head_mask to fsmt_modeling.py
      
      * Enable test_headmasking and some changes to docs
      
      * Remove test_head_masking flag from fsmt test file
      
      Remove test_head_masking flag from test_modeling_fsmt.py
      since test_head_masking is set to be True by default (thus it is redundant to store).
      
      * Merge master and remove test_head_masking = True
      
      * Rebase necessary due to an update of jaxlib
      
      * Remove test_head_masking=True in tests/test_modeling_fsmt.py
      as it is redundant.
      0c6c0afc
  8. 29 Jan, 2021 2 commits
    • Julien Plu's avatar
      Add XLA test (#9848) · fdcde144
      Julien Plu authored
      fdcde144
    • Nicolas Patry's avatar
      Adding a new `return_full_text` parameter to TextGenerationPipeline. (#9852) · c2d0ffec
      Nicolas Patry authored
      * Adding a new `return_full_text` parameter to TextGenerationPipeline.
      
      For text-generation, it's sometimes used as prompting text.
      In that context, prefixing `generated_text` with the actual input
      forces the caller to take an extra step to remove it.
      
      The proposed change adds a new parameter (for backward compatibility).
      `return_full_text` that enables the caller to prevent adding the prefix.
      
      * Doc quality.
      c2d0ffec
  9. 28 Jan, 2021 3 commits
  10. 27 Jan, 2021 6 commits