1. 27 Apr, 2020 1 commit
  2. 22 Apr, 2020 2 commits
    • Lorenzo Ampil's avatar
      Pipeline for Text Generation: GenerationPipeline (#3758) · f16540fc
      Lorenzo Ampil authored
      
      
      * Add GenerationPipeline
      
      * Fix parameter names
      
      * Correct parameter __call__ parameters
      
      * Add model type attribute and correct function calls for prepare_input
      
      * Take out trailing commas from init attributes
      
      * Remove unnecessary tokenization line
      
      * Implement support for multiple text inputs
      
      * Apply generation support for multiple input text prompts
      
      * Take out tensor coersion
      
      * Take out batch index
      
      * Add text prompt to return sequence
      
      * Squeeze token tensore before decoding
      
      * Return only a single list of sequences if only one prompt was used
      
      * Correct results variable name
      
      * Add GenerationPipeline to SUPPORTED_TASKS with the alias , initalized w GPT2
      
      * Registedred AutoModelWithLMHead for both pt and t
      
      * Update docstring for GenerationPipeline
      
      * Add kwargs parameter to mode.generate
      
      * Take out kwargs parameter after all
      
      * Add generation pipeline example in pipeline docstring
      
      * Fix max length by squeezing tokens tensor
      
      * Apply ensure_tensor_on_device to pytorch tensor
      
      * Include generation step in torch.no_grad
      
      * Take out input from prepare_xlm_input and set 'en' as default xlm_language
      
      * Apply framework specific encoding during prepare_input
      
      * Format w make style
      
      * Move GenerationPipeline import to follow proper import sorting
      
      * Take out training comma from generation dict
      
      * Apply requested changes
      
      * Change name to TextGenerationPipeline
      
      * Apply TextGenerationPipeline rename to __init___
      
      * Changing alias to
      
      * Set input mapping as input to ensure_tensor_on_device
      
      * Fix assertion placement
      
      * Add test_text_generation
      
      * Add TextGenerationPipeline to PipelineCommonTests
      
      * Take out whitespace
      
      * Format __init__ w black
      
      * Fix __init__ style
      
      * Forman __init___
      
      * Add line to end of __init__
      
      * Correct model tokenizer set for test_text_generation
      
      * Ensure to return list of list, not list of string (to pass test)
      
      * Limit test models to only 3 to limit runtime to address circleCI timeout error
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update tests/test_pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Remove argument docstring, __init__, add additional __call__ arguments, and reformat results to list of dict
      
      * Fix blank result list
      
      * Add TextGenerationPipeline to pipelines.rst
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Fix typos from adding PADDING_TEXT_TOKEN_LENGTH
      
      * Fix incorrectly moved result list
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Add back generation line and make style
      
      * Take out blank whitespace
      
      * Apply new alis, text-generation, to test_pipelines
      
      * Fix text generation alias in test
      
      * Update src/transformers/pipelines.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarJulien Chaumond <chaumond@gmail.com>
      f16540fc
    • Julien Chaumond's avatar
      Fixes #3877 · 1dc9b3c7
      Julien Chaumond authored
      1dc9b3c7
  3. 18 Apr, 2020 1 commit
    • Thomas Wolf's avatar
      Cleanup fast tokenizers integration (#3706) · 827d6d6e
      Thomas Wolf authored
      
      
      * First pass on utility classes and python tokenizers
      
      * finishing cleanup pass
      
      * style and quality
      
      * Fix tests
      
      * Updating following @mfuntowicz comment
      
      * style and quality
      
      * Fix Roberta
      
      * fix batch_size/seq_length inBatchEncoding
      
      * add alignement methods + tests
      
      * Fix OpenAI and Transfo-XL tokenizers
      
      * adding trim_offsets=True default for GPT2 et RoBERTa
      
      * style and quality
      
      * fix tests
      
      * add_prefix_space in roberta
      
      * bump up tokenizers to rc7
      
      * style
      
      * unfortunately tensorfow does like these - removing shape/seq_len for now
      
      * Update src/transformers/tokenization_utils.py
      Co-Authored-By: default avatarStefan Schweter <stefan@schweter.it>
      
      * Adding doc and docstrings
      
      * making flake8 happy
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      827d6d6e
  4. 16 Apr, 2020 1 commit
  5. 10 Apr, 2020 2 commits
  6. 06 Apr, 2020 2 commits
  7. 04 Apr, 2020 1 commit
  8. 03 Apr, 2020 1 commit
    • Lysandre Debut's avatar
      ELECTRA (#3257) · d5d7d886
      Lysandre Debut authored
      * Electra wip
      
      * helpers
      
      * Electra wip
      
      * Electra v1
      
      * ELECTRA may be saved/loaded
      
      * Generator & Discriminator
      
      * Embedding size instead of halving the hidden size
      
      * ELECTRA Tokenizer
      
      * Revert BERT helpers
      
      * ELECTRA Conversion script
      
      * Archive maps
      
      * PyTorch tests
      
      * Start fixing tests
      
      * Tests pass
      
      * Same configuration for both models
      
      * Compatible with base + large
      
      * Simplification + weight tying
      
      * Archives
      
      * Auto + Renaming to standard names
      
      * ELECTRA is uncased
      
      * Tests
      
      * Slight API changes
      
      * Update tests
      
      * wip
      
      * ElectraForTokenClassification
      
      * temp
      
      * Simpler arch + tests
      
      Removed ElectraForPreTraining which will be in a script
      
      * Conversion script
      
      * Auto model
      
      * Update links to S3
      
      * Split ElectraForPreTraining and ElectraForTokenClassification
      
      * Actually test PreTraining model
      
      * Remove num_labels from configuration
      
      * wip
      
      * wip
      
      * From discriminator and generator to electra
      
      * Slight API changes
      
      * Better naming
      
      * TensorFlow ELECTRA tests
      
      * Accurate conversion script
      
      * Added to conversion script
      
      * Fast ELECTRA tokenizer
      
      * Style
      
      * Add ELECTRA to README
      
      * Modeling Pytorch Doc + Real style
      
      * TF Docs
      
      * Docs
      
      * Correct links
      
      * Correct model intialized
      
      * random fixes
      
      * style
      
      * Addressing Patrick's and Sam's comments
      
      * Correct links in docs
      d5d7d886
  9. 31 Mar, 2020 3 commits
  10. 30 Mar, 2020 2 commits
    • LysandreJik's avatar
      Release: v2.7.0 · 6f5a12a5
      LysandreJik authored
      6f5a12a5
    • Patrick von Platen's avatar
      [T5] Add training documenation (#3507) · 5b44e0a3
      Patrick von Platen authored
      * Add clear description of how to train T5
      
      * correct docstring in T5
      
      * correct typo
      
      * correct docstring format
      
      * update t5 model docs
      
      * implement collins feedback
      
      * fix typo and add more explanation for sentinal tokens
      
      * delete unnecessary todos
      5b44e0a3
  11. 27 Mar, 2020 1 commit
    • Patrick von Platen's avatar
      Add T5 to docs (#3461) · fa9af246
      Patrick von Platen authored
      * add t5 docs basis
      
      * improve docs
      
      * add t5 docs
      
      * improve t5 docstring
      
      * add t5 tokenizer docstring
      
      * finish docstring
      
      * make style
      
      * add pretrained models
      
      * correct typo
      
      * make examples work
      
      * finalize docs
      fa9af246
  12. 24 Mar, 2020 1 commit
  13. 17 Mar, 2020 2 commits
    • Sam Shleifer's avatar
      Add Summarization to Pipelines (#3128) · 38a555a8
      Sam Shleifer authored
      * passing
      
      * Undo stupid chg
      
      * docs
      
      * undo rename
      
      * delete-cruft
      
      * only import if you have torch
      
      * Dont rely on dict ordering
      
      * Fix dict ordering upstream
      
      * docstring link
      
      * docstring link
      
      * remove trailing comma for 3.5 compat
      
      * new name
      
      * delegate kwarging
      
      * Update kwargs
      38a555a8
    • Thomas Wolf's avatar
      CPU/GPU memory benchmarking utilities - Remove support for python 3.5 (now only 3.6+) (#3186) · 2187c49f
      Thomas Wolf authored
      * memory benchmark rss
      
      * have both forward pass and line-by-line mem tracing
      
      * cleaned up tracing
      
      * refactored and cleaning up API
      
      * no f-strings yet...
      
      * add GPU mem logging
      
      * fix GPU memory monitoring
      
      * style and quality
      
      * clean up and doc
      
      * update with comments
      
      * Switching to python 3.6+
      
      * fix quality
      2187c49f
  14. 10 Mar, 2020 2 commits
  15. 05 Mar, 2020 2 commits
  16. 02 Mar, 2020 2 commits
    • Lysandre Debut's avatar
      Pipeline doc (#3055) · d3eb7d23
      Lysandre Debut authored
      * Pipeline doc initial commit
      
      * pipeline abstraction
      
      * Remove modelcard argument from pipeline
      
      * Task-specific pipelines can be instantiated with no model or tokenizer
      
      * All pipelines doc
      d3eb7d23
    • Sam Shleifer's avatar
      Bart-CNN (#3059) · b54ef78d
      Sam Shleifer authored
      `generate` code that produces 99% identical summarizations to fairseq on CNN test data, with caching.
      b54ef78d
  17. 26 Feb, 2020 1 commit
  18. 25 Feb, 2020 2 commits
    • Lysandre Debut's avatar
      Documentation (#2989) · bb7c4685
      Lysandre Debut authored
      * All Tokenizers
      
      BertTokenizer + few fixes
      RobertaTokenizer
      OpenAIGPTTokenizer + Fixes
      GPT2Tokenizer + fixes
      TransfoXLTokenizer
      Correct rst for TransformerXL
      XLMTokenizer + fixes
      XLNet Tokenizer + Style
      DistilBERT + Fix XLNet RST
      CTRLTokenizer
      CamemBERT Tokenizer
      FlaubertTokenizer
      XLMRobertaTokenizer
      cleanup
      
      * cleanup
      bb7c4685
    • Lysandre Debut's avatar
      Adding usage examples for common tasks (#2850) · 65e7c90a
      Lysandre Debut authored
      * Usage: Sequence Classification & Question Answering
      
      * Pipeline example
      
      * Language modeling
      
      * TensorFlow code for Sequence classification
      
      * Custom TF/PT toggler in docs
      
      * QA + LM for TensorFlow
      
      * Finish Usage for both PyTorch and TensorFlow
      
      * Addressing Julien's comments
      
      * More assertive
      
      * cleanup
      
      * Favicon
      - added favicon option in conf.py along with the favicon image
      - udpated 馃
      
       logo. slightly smaller and should appear more consistent across editing programs (no more tongue on the outside of the mouth)
      Co-authored-by: default avatarjoshchagani <joshua@joshuachagani.com>
      65e7c90a
  19. 24 Feb, 2020 1 commit
  20. 20 Feb, 2020 1 commit
    • Sam Shleifer's avatar
      New BartModel (#2745) · 53ce3854
      Sam Shleifer authored
      * Results same as fairseq
      * Wrote a ton of tests
      * Struggled with api signatures
      * added some docs
      
      53ce3854
  21. 19 Feb, 2020 1 commit
  22. 10 Feb, 2020 1 commit
  23. 07 Feb, 2020 4 commits
  24. 06 Feb, 2020 2 commits
  25. 05 Feb, 2020 1 commit