1. 05 Jun, 2020 3 commits
  2. 04 Jun, 2020 1 commit
  3. 03 Jun, 2020 1 commit
    • Julien Chaumond's avatar
      Pipelines: miscellanea of QoL improvements and small features... (#4632) · 99207bd1
      Julien Chaumond authored
      * [hf_api] Attach all unknown attributes for future-proof compatibility
      
      * [Pipeline] NerPipeline is really a TokenClassificationPipeline
      
      * modelcard.py: I don't think we need to force the download
      
      * Remove config, tokenizer from SUPPORTED_TASKS as we're moving to one model = one weight + one tokenizer
      
      * FillMaskPipeline: also output token in string form
      
      * TextClassificationPipeline: option to return all scores, not just the argmax
      
      * Update docs/source/main_classes/pipelines.rst
      99207bd1
  4. 02 Jun, 2020 3 commits
  5. 29 May, 2020 2 commits
  6. 27 May, 2020 1 commit
  7. 26 May, 2020 1 commit
  8. 25 May, 2020 1 commit
  9. 22 May, 2020 2 commits
  10. 19 May, 2020 2 commits
    • Patrick von Platen's avatar
      [Longformer] Docs and clean API (#4464) · 48c3a70b
      Patrick von Platen authored
      * add longformer docs
      
      * improve docs
      48c3a70b
    • Iz Beltagy's avatar
      Longformer (#4352) · 8f1d0471
      Iz Beltagy authored
      * first commit
      
      * bug fixes
      
      * better examples
      
      * undo padding
      
      * remove wrong VOCAB_FILES_NAMES
      
      * License
      
      * make style
      
      * make isort happy
      
      * unit tests
      
      * integration test
      
      * make `black` happy by undoing `isort` changes!!
      
      * lint
      
      * no need for the padding value
      
      * batch_size not bsz
      
      * remove unused type casting
      
      * seqlen not seq_len
      
      * staticmethod
      
      * `bert` selfattention instead of `n2`
      
      * uint8 instead of bool + lints
      
      * pad inputs_embeds using embeddings not a constant
      
      * black
      
      * unit test with padding
      
      * fix unit tests
      
      * remove redundant unit test
      
      * upload model weights
      
      * resolve todo
      
      * simpler _mask_invalid_locations without lru_cache + backward compatible masked_fill_
      
      * increase unittest coverage
      8f1d0471
  11. 18 May, 2020 1 commit
  12. 13 May, 2020 3 commits
  13. 11 May, 2020 5 commits
  14. 10 May, 2020 2 commits
  15. 07 May, 2020 5 commits
    • Julien Chaumond's avatar
      c99fe038
    • Julien Chaumond's avatar
      Examples readme.md (#4215) · 612fa1b1
      Julien Chaumond authored
      * README
      
      * Update README.md
      612fa1b1
    • Lysandre's avatar
      Release: v2.9.0 · e7cfc1a3
      Lysandre authored
      e7cfc1a3
    • Julien Chaumond's avatar
      BIG Reorganize examples (#4213) · 0ae96ff8
      Julien Chaumond authored
      * Created using Colaboratory
      
      * [examples] reorganize files
      
      * remove run_tpu_glue.py as superseded by TPU support in Trainer
      
      * Bugfix: int, not tuple
      
      * move files around
      0ae96ff8
    • Patrick von Platen's avatar
      Reformer (#3351) · dca34695
      Patrick von Platen authored
      * first copy & past commit from Bert and morgans LSH code
      
      * add easy way to compare to trax original code
      
      * translate most of function
      
      * make trax lsh self attention deterministic with numpy seed + copy paste code
      
      * add same config
      
      * add same config
      
      * make layer init work
      
      * implemented hash_vectors function for lsh attention
      
      * continue reformer translation
      
      * hf LSHSelfAttentionLayer gives same output as trax layer
      
      * refactor code
      
      * refactor code
      
      * refactor code
      
      * refactor
      
      * refactor + add reformer config
      
      * delete bogus file
      
      * split reformer attention layer into two layers
      
      * save intermediate step
      
      * save intermediate step
      
      * make test work
      
      * add complete reformer block layer
      
      * finish reformer layer
      
      * implement causal and self mask
      
      * clean reformer test and refactor code
      
      * fix merge conflicts
      
      * fix merge conflicts
      
      * update init
      
      * fix device for GPU
      
      * fix chunk length init for tests
      
      * include morgans optimization
      
      * improve memory a bit
      
      * improve comment
      
      * factorize num_buckets
      
      * better testing parameters
      
      * make whole model work
      
      * make lm model work
      
      * add t5 copy paste tokenizer
      
      * add chunking feed forward
      
      * clean config
      
      * add improved assert statements
      
      * make tokenizer work
      
      * improve test
      
      * correct typo
      
      * extend config
      
      * add complexer test
      
      * add new axial position embeddings
      
      * add local block attention layer
      
      * clean tests
      
      * refactor
      
      * better testing
      
      * save intermediate progress
      
      * clean test file
      
      * make shorter input length work for model
      
      * allow variable input length
      
      * refactor
      
      * make forward pass for pretrained model work
      
      * add generation possibility
      
      * finish dropout and init
      
      * make style
      
      * refactor
      
      * add first version of RevNet Layers
      
      * make forward pass work and add convert file
      
      * make uploaded model forward pass work
      
      * make uploaded model forward pass work
      
      * refactor code
      
      * add namedtuples and cache buckets
      
      * correct head masks
      
      * refactor
      
      * made reformer more flexible
      
      * make style
      
      * remove set max length
      
      * add attention masks
      
      * fix up tests
      
      * fix lsh attention mask
      
      * make random seed optional for the moment
      
      * improve memory in reformer
      
      * add tests
      
      * make style
      
      * make sure masks work correctly
      
      * detach gradients
      
      * save intermediate
      
      * correct backprob through gather
      
      * make style
      
      * change back num hashes
      
      * rename to labels
      
      * fix rotation shape
      
      * fix detach
      
      * update
      
      * fix trainer
      
      * fix backward dropout
      
      * make reformer more flexible
      
      * fix conflict
      
      * fix
      
      * fix
      
      * add tests for fixed seed in reformer layer
      
      * fix trainer typo
      
      * fix typo in activations
      
      * add fp16 tests
      
      * add fp16 training
      
      * support fp16
      
      * correct gradient bug in reformer
      
      * add fast gelu
      
      * re-add dropout for embedding dropout
      
      * better naming
      
      * better naming
      
      * renaming
      
      * finalize test branch
      
      * finalize tests
      
      * add more tests
      
      * finish tests
      
      * fix
      
      * fix type trainer
      
      * fix fp16 tests
      
      * fix tests
      
      * fix tests
      
      * fix tests
      
      * fix issue with dropout
      
      * fix dropout seeds
      
      * correct random seed on gpu
      
      * finalize random seed for dropout
      
      * finalize random seed for dropout
      
      * remove duplicate line
      
      * correct half precision bug
      
      * make style
      
      * refactor
      
      * refactor
      
      * docstring
      
      * remove sinusoidal position encodings for reformer
      
      * move chunking to modeling_utils
      
      * make style
      
      * clean config
      
      * make style
      
      * fix tests
      
      * fix auto tests
      
      * pretrained models
      
      * fix docstring
      
      * update conversion file
      
      * Update pretrained_models.rst
      
      * fix rst
      
      * fix rst
      
      * update copyright
      
      * fix test path
      
      * fix test path
      
      * fix small issue in test
      
      * include reformer in generation tests
      
      * add docs for axial position encoding
      
      * finish docs
      
      * Update convert_reformer_trax_checkpoint_to_pytorch.py
      
      * remove isort
      
      * include sams comments
      
      * remove wrong comment in utils
      
      * correct typos
      
      * fix typo
      
      * Update reformer.rst
      
      * applied morgans optimization
      
      * make style
      
      * make gpu compatible
      
      * remove bogus file
      
      * big test refactor
      
      * add example for chunking
      
      * fix typo
      
      * add to README
      dca34695
  16. 01 May, 2020 1 commit
  17. 28 Apr, 2020 2 commits
  18. 27 Apr, 2020 1 commit
  19. 22 Apr, 2020 2 commits
    • Lorenzo Ampil's avatar
      Pipeline for Text Generation: GenerationPipeline (#3758) · f16540fc
      Lorenzo Ampil authored
      
      
      * Add GenerationPipeline
      
      * Fix parameter names
      
      * Correct parameter __call__ parameters
      
      * Add model type attribute and correct function calls for prepare_input
      
      * Take out trailing commas from init attributes
      
      * Remove unnecessary tokenization line
      
      * Implement support for multiple text inputs
      
      * Apply generation support for multiple input text prompts
      
      * Take out tensor coersion
      
      * Take out batch index
      
      * Add text prompt to return sequence
      
      * Squeeze token tensore before decoding
      
      * Return only a single list of sequences if only one prompt was used
      
      * Correct results variable name
      
      * Add GenerationPipeline to SUPPORTED_TASKS with the alias , initalized w GPT2
      
      * Registedred AutoModelWithLMHead for both pt and t
      
      * Update docstring for GenerationPipeline
      
      * Add kwargs parameter to mode.generate
      
      * Take out kwargs parameter after all
      
      * Add generation pipeline example in pipeline docstring
      
      * Fix max length by squeezing tokens tensor
      
      * Apply ensure_tensor_on_device to pytorch tensor
      
      * Include generation step in torch.no_grad
      
      * Take out input from prepare_xlm_input and set 'en' as default xlm_language
      
      * Apply framework specific encoding during prepare_input
      
      * Format w make style
      
      * Move GenerationPipeline import to follow proper import sorting
      
      * Take out training comma from generation dict
      
      * Apply requested changes
      
      * Change name to TextGenerationPipeline
      
      * Apply TextGenerationPipeline rename to __init___
      
      * Changing alias to
      
      * Set input mapping as input to ensure_tensor_on_device
      
      * Fix assertion placement
      
      * Add test_text_generation
      
      * Add TextGenerationPipeline to PipelineCommonTests
      
      * Take out whitespace
      
      * Format __init__ w black
      
      * Fix __init__ style
      
      * Forman __init___
      
      * Add line to end of __init__
      
      * Correct model tokenizer set for test_text_generation
      
      * Ensure to return list of list, not list of string (to pass test)
      
      * Limit test models to only 3 to limit runtime to address circleCI timeout error
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update tests/test_pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Remove argument docstring, __init__, add additional __call__ arguments, and reformat results to list of dict
      
      * Fix blank result list
      
      * Add TextGenerationPipeline to pipelines.rst
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Fix typos from adding PADDING_TEXT_TOKEN_LENGTH
      
      * Fix incorrectly moved result list
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      
      * Update src/transformers/pipelines.py
      Co-Authored-By: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Add back generation line and make style
      
      * Take out blank whitespace
      
      * Apply new alis, text-generation, to test_pipelines
      
      * Fix text generation alias in test
      
      * Update src/transformers/pipelines.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarJulien Chaumond <chaumond@gmail.com>
      f16540fc
    • Julien Chaumond's avatar
      Fixes #3877 · 1dc9b3c7
      Julien Chaumond authored
      1dc9b3c7
  20. 18 Apr, 2020 1 commit
    • Thomas Wolf's avatar
      Cleanup fast tokenizers integration (#3706) · 827d6d6e
      Thomas Wolf authored
      
      
      * First pass on utility classes and python tokenizers
      
      * finishing cleanup pass
      
      * style and quality
      
      * Fix tests
      
      * Updating following @mfuntowicz comment
      
      * style and quality
      
      * Fix Roberta
      
      * fix batch_size/seq_length inBatchEncoding
      
      * add alignement methods + tests
      
      * Fix OpenAI and Transfo-XL tokenizers
      
      * adding trim_offsets=True default for GPT2 et RoBERTa
      
      * style and quality
      
      * fix tests
      
      * add_prefix_space in roberta
      
      * bump up tokenizers to rc7
      
      * style
      
      * unfortunately tensorfow does like these - removing shape/seq_len for now
      
      * Update src/transformers/tokenization_utils.py
      Co-Authored-By: default avatarStefan Schweter <stefan@schweter.it>
      
      * Adding doc and docstrings
      
      * making flake8 happy
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      827d6d6e