1. 08 Feb, 2021 5 commits
    • Anthony MOI's avatar
      Update tokenizers requirement (#10077) · f285e4c3
      Anthony MOI authored
      f285e4c3
    • noise-field's avatar
      Fix mlflow param overflow clean (#10071) · ddaafd78
      noise-field authored
      * Unify logging with f-strings
      
      * Get limits from MLflow rather than hardcode
      
      * Add a check for parameter length overflow
      
      Also constants are marked as internal
      
      * Don't stop run in on_train_end
      
      This causes bad behaviour when there is a seprarte validation step:
      validation gets recorded as separate run.
      
      * Fix style
      ddaafd78
    • Julien Plu's avatar
      Restore TF embeddings and attention layers to their previous version (#9890) · 31563e05
      Julien Plu authored
      * Refacto BERT
      
      * Restore all the concerned models
      
      * Remove print
      
      * Update template
      
      * Apply Sylvain's and Morgan's comments
      
      * Fix cast
      
      * Put the cast inside call
      
      * Remove cond in ebds
      
      * Fix funnel
      
      * Restore previous dot product (attention_scores) computation
      
      * Add ConvBERT and BART
      
      * Make all the S2S models ONNX compliant
      
      * Fix test
      
      * Fix check copies
      31563e05
    • Nicolas Patry's avatar
      Cleaning up `ConversationalPipeline` to support more than DialoGPT. (#10002) · b1aa4982
      Nicolas Patry authored
      * Cleaning up `ConversationalPipeline` to support more than DialoGPT.
      
      Currently ConversationalPipeline was heavily biased towards DialoGPT
      ,which is the default model for this pipeline.
      
      This PR proposes changes to put back the modifications specific to
      DialoGPT into tokenizer-specific behavior wherever possible, by
      creating `_build_conversation_input_ids` function that takes
      conversation as input, and returns a list of ints corresponding
      to the tokens. It feels natural to put here because all models
      have probably different strategies to build input_ids from the
      full conversation and it's the tokenizer's job to transform strings
      into tokens (and vice-versa)
      
      If `_build_conversation_input_ids` is missing, previous behavior is
      used so we don't break anything so far (except for blenderbot where it's a fix).
      
      This PR also contains a fix for too long inputs. There used
      to be dead code for trying to limit the size of incoming input.
      The introduced fixed is that we limit
      within `_build_conversation_input_ids` to `tokenizer.model_max_length`.
      It corresponds to the intent of the removed dead code and is actually
      better because it corresponds to `model_max_length` which is different
      from `max_length` (which is a default parameter for `generate`).
      
      - Removed `history` logic from the Conversation as it's not relevant
      anymore because tokenization logic has been moved to tokenizer.
      And tokenizer cannot save any cache, and conversation cannot know
      what is relevant or not.
      Also it's not usable from `blenderbot` because the input_ids are
      not append only (EOS tokens is always at the end).
      
      - Added `iter_texts` method on `Conversation` because all
      the code was literred with some form of this iteration of
      past/generated_responses.
      
      * Removing torch mention in types.
      
      * Adding type checking to `_build_conversation_input_ids`.
      
      * Fixing import in strings.
      b1aa4982
    • Sylvain Gugger's avatar
      A few fixes in the documentation (#10033) · 45aaf5f7
      Sylvain Gugger authored
      45aaf5f7
  2. 05 Feb, 2021 3 commits
  3. 04 Feb, 2021 10 commits
  4. 03 Feb, 2021 4 commits
  5. 02 Feb, 2021 8 commits
  6. 01 Feb, 2021 8 commits
  7. 31 Jan, 2021 2 commits